Hybrid ViT - U-Net For Building Segmentation From Satalite Imagery¶

Draft Final Project Model V9¶

By Irti Haq

alt text

Research Questions: Can a Hybrid ViT + U-Net model achieve high segmentation accuracy (Dice score) and low boundary error (HD95) when segmenting buildings from satellite imagery across diverse urban areas with varying geographic and urban/architectural characteristics?

Dataset: Chinese GaoFen-7 (GF-7) satellite imagery

This dataset is a high-resolution building segmentation dataset. This GF-7 dataset provides an extensive coverage of urban and rural areas of China by picking 6 typical cities in China (Chen et al.2024). The dataset contains 5175 pairs of 512Ɨ512 image tiles and 170,015 buildings. Compared to other datasets constructed through satellite and aerial imagery, this dataset has various ground-truth labels for building extraction.

Link: A benchmark GaoFen-7 dataset for building extraction from satellite images

Model : TransUNet

This model is a specialized adaptation of the TransUNet architecture, originally developed by Chen et al. (2021) for medical image segmentation. Here, its application has been applied for building segmentation from satellite imagery.

The Model is a Hybrid ViT + U-Net Model that uses a Vision Transformer as its Encoder and a U-Net Style Model as its Decoder. More specifically the model uses a CNN network (ResNet) and transformer blocks as the encoder and up sampling layers as the decoder to achieve the task. Inspired by a U-Net Structure, Trans-UNet uses the residual network to extract features and do down samplings. The results are then fed into a transformer to encode. Afterwards, up sampling is used to decode the information. The model uses a Pre trained Vision Transformer, specifically in this project, the ResNet-50 Vit-B 16 from Google Research's Vision Transformer implementation is used.

Most of the Code and Architecture for the Model has been directly taken from the TransUNet project by Chen et al. (2021) and then adapted to be used for building Segmentation.

Source: TransUNet: Medical Image Segmentation & TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation¶

Key Changes¶

Overall These are some of the Key Changes I have made to the models Architeture to improve perfomance for Building Segmentation from Satalite Imagery.

  1. The model optimizer was progressively changed from SGD with Momentum → Adam → AdamW → AdamW with AMSGrad. These changes significantly improved training speed, and adding AMSGrad in particular helped enhance convergence

  2. The original use of polynomial learning rate decay led to inconsistent convergence behavior and slower training progress, particularly in the early epochs. To address this, I implemented CosineAnnealingWarmRestarts, which provided smoother and more adaptive learning rate scheduling. This change resulted in faster convergence, reduced training noise, and improved overall stability.

  3. Introduced data augmentation using the Albumentations library. Initially applied a stronger set of transformations, but this led to worsened performance. The augmentations were then progressively simplified, ultimately leaving only mild geometric transforms (horizontal/vertical flips and 90° rotations) to improve model generalization and test performance.

  4. Optimized various training parameters, including batch size, learning rate, number of skip connections, and number of training epochs.

InĀ [1]:
# Loading necessary libraries
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
import random
import h5py
import cv2
import numpy as np
import torch
from PIL import Image
from scipy import ndimage
from scipy.ndimage.interpolation import zoom
from torch.utils.data import Dataset
import albumentations as A
from albumentations.pytorch import ToTensorV2

## Progress bar
from tqdm.notebook import tqdm

# Check for CUDA, then MPS (for Mac), then CPU
if torch.cuda.is_available():
    device = torch.device("cuda")
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cpu")
print("Using device:", device)
C:\Users\irti2\AppData\Local\Temp\ipykernel_29636\2787216946.py:11: DeprecationWarning: Please import `zoom` from the `scipy.ndimage` namespace; the `scipy.ndimage.interpolation` namespace is deprecated and will be removed in SciPy 2.0.0.
  from scipy.ndimage.interpolation import zoom
Using device: cuda
c:\Users\irti2\miniconda3\envs\pytorch\Lib\site-packages\albumentations\check_version.py:147: UserWarning: Error fetching version info <urlopen error [Errno 11001] getaddrinfo failed>
  data = fetch_version_info()
InĀ [2]:
def random_rot_flip(image, label):
    k = np.random.randint(0, 4)
    image = np.rot90(image, k)
    label = np.rot90(label, k)
    axis = np.random.randint(0, 2)
    image = np.flip(image, axis=axis).copy()
    label = np.flip(label, axis=axis).copy()
    return image, label


def random_rotate(image, label):
    angle = np.random.randint(-20, 20)
    image = ndimage.rotate(image, angle, order=0, reshape=False)
    label = ndimage.rotate(label, angle, order=0, reshape=False)
    return image, label

class RandomGenerator(object):
    def __init__(self, output_size):
        self.output_size = output_size

    def __call__(self, sample):
        image, label = sample['image'], sample['label']

        if random.random() > 0.5:
            image, label = random_rot_flip(image, label)
        elif random.random() > 0.5:
            image, label = random_rotate(image, label)
        x, y = image.shape
        if x != self.output_size[0] or y != self.output_size[1]:
            image = zoom(image, (self.output_size[0] / x, self.output_size[1] / y), order=3) 
            label = zoom(label, (self.output_size[0] / x, self.output_size[1] / y), order=0)
        image = torch.from_numpy(image.astype(np.float32)).unsqueeze(0)
        label = torch.from_numpy(label.astype(np.float32))
        sample = {'image': image, 'label': label.long()}
        return sample

Data Loader Class¶

InĀ [3]:
class GF7Dataset(Dataset):
    def __init__(self, image_dir, mask_dir, image_size=224, transform=None):
        self.image_paths = sorted([
            os.path.join(image_dir, f) for f in os.listdir(image_dir)
            if f.lower().endswith(('.png', '.jpg', '.jpeg', '.tif', '.tiff'))
        ])
        self.mask_paths = sorted([
            os.path.join(mask_dir, f) for f in os.listdir(mask_dir)
            if f.lower().endswith(('.png', '.jpg', '.jpeg', '.tif', '.tiff'))
        ])
        self.image_size = image_size
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image_path = self.image_paths[idx]
        mask_path = self.mask_paths[idx]

        image = cv2.imread(image_path)
        if image is None:
            raise FileNotFoundError(f"Could not read image: {image_path}")
        
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # Convert BGR to RGB
        image = cv2.resize(image, (self.image_size, self.image_size)) # Resize to target size
        image = image.astype(np.float32) / 255.0 # Normalize to [0, 1] 

        mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
        if mask is None:
            raise FileNotFoundError(f"Could not read mask: {mask_path}")
        mask = cv2.resize(mask, (self.image_size, self.image_size)) # Resize to target size
        mask = (mask > 127).astype(np.float32)  # Binarize

        if self.transform:
            augmented = self.transform(image=image, mask=mask) 
            image = augmented['image'] 
            mask = augmented['mask']  # Already a tensor with shape [H, W] or [1, H, W]
        
        # If a transform is not provided, or if the probabilistic transform
        # was skipped, the data will still be numpy arrays.
        # This check ensures they are always converted to tensors.
        
        if not isinstance(image, torch.Tensor):
            # Apply ImageNet normalization and convert to a tensor
            image = (image - np.array([0.485, 0.456, 0.406])) / np.array([0.229, 0.224, 0.225]) # Standardize to ImageNet
            image = torch.from_numpy(image.transpose(2, 0, 1)).float() # Convert to [C, H, W]

        if not isinstance(mask, torch.Tensor):
            mask = torch.from_numpy(mask).float()

        # Ensure mask has a channel dimension
        if mask.ndim == 2:
            mask = mask.unsqueeze(0)

        return image, mask

Testings to See if Dataloader is Working Correctly¶

InĀ [4]:
# Test to see if the dataset works

from torch.utils.data import DataLoader

dataset = GF7Dataset(
    image_dir = "data/GF-7 Building (3Bands)/Train/image",
    mask_dir = "data/GF-7 Building (3Bands)/Train/label",
    image_size=224
)

loader = DataLoader(dataset, batch_size=8, shuffle=True)

# Iterate over a batch
for images, masks in loader:
    print(images.shape)  # [B, 3, 224, 224]
    print(masks.shape)   # [B, 1, 224, 224]
    break

print(f"Number of samples in dataset: {len(dataset)}")
torch.Size([8, 3, 224, 224])
torch.Size([8, 1, 224, 224])
Number of samples in dataset: 3106

Tranformations¶

Heavy Tranformation Pipeline¶

InĀ [5]:
prob_pipeline = 1

tranform_pipline = A.Compose([
    
    A.RandomRotate90(p=0.5),
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
    A.SomeOf([
        A.RandomBrightnessContrast(brightness_limit=0.1, contrast_limit=0.1, p=1.0),
        A.RGBShift(r_shift_limit=8, g_shift_limit=8, b_shift_limit=8, p=1.0),
        A.HueSaturationValue(hue_shift_limit=5, sat_shift_limit=8, val_shift_limit=5, p=1.0),
        A.RandomGamma(gamma_limit=(90, 110), p=1.0),
    ], n=2, p=0.8), # Change to 1
    
    
    # # Cloud Cover Simulation
    # A.SomeOf([
    #     A.RandomFog(fog_coef_range=(0.1, 0.2), alpha_coef=0.08, p=1.0),
    #     A.RandomShadow(p=1.0),
    # ], n=1, p=0.5),
    
    
    # Noise
    A.OneOf([
        A.MultiplicativeNoise(multiplier=(0.7, 1.2), per_channel=True,elementwise=True, p=1.0),
        # A.GaussNoise(p=1.0), Adding Too much noise
    ], p=0.4),
    
    #A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), # Standardize to ImageNet
    #ToTensorV2(),
    
    ], p=prob_pipeline, seed=42)

Lite Transformation Pipeline¶

InĀ [6]:
prob_pipeline = 1

lite_tranform_pipeline = A.Compose([
    
    # Light spatial transforms
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
    A.RandomRotate90(p=0.5),  # Lower rotation probability

    # One mild color transform per sample
    A.OneOf([
        A.RandomBrightnessContrast(brightness_limit=0.05, contrast_limit=0.05, p=1.0),
        A.HueSaturationValue(hue_shift_limit=3, sat_shift_limit=5, val_shift_limit=3, p=1.0),
        A.RGBShift(r_shift_limit=3, g_shift_limit=3, b_shift_limit=3, p=1.0),
        A.RandomGamma(gamma_limit=(98, 102), p=1.0),
    ], p=0.3),  # Lighter color shift and reduced prob

    # Light noise
    A.MultiplicativeNoise(multiplier=(0.9, 1.1), per_channel=True, elementwise=False, p=0.2),

    # Normalize + tensor
    #A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], p=1.0),
    #ToTensorV2(p=1.0),
], p=prob_pipeline, seed=42)

XL Lite Transform Pipeline (Currently Used)¶

InĀ [7]:
prob_pipeline = 1

XL_lite_tranform_pipeline = A.Compose([
    
    # Light spatial transforms
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
    A.RandomRotate90(p=0.5),  # Lower rotation probability

    # # One mild color transform per sample
    # A.OneOf([
    #     A.RandomBrightnessContrast(brightness_limit=0.05, contrast_limit=0.05, p=1.0),
    #     A.HueSaturationValue(hue_shift_limit=3, sat_shift_limit=5, val_shift_limit=3, p=1.0),
    #     A.RGBShift(r_shift_limit=3, g_shift_limit=3, b_shift_limit=3, p=1.0),
    #     A.RandomGamma(gamma_limit=(98, 102), p=1.0),
    # ], p=0.3),  # Lighter color shift and reduced prob

    # # Light noise
    # A.MultiplicativeNoise(multiplier=(0.9, 1.1), per_channel=True, elementwise=False, p=0.2),

    # Normalize + tensor
    #A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], p=1.0),
    #ToTensorV2(p=1.0),
], p=prob_pipeline, seed=42)

Testing and Checking Augmentations¶

InĀ [8]:
## Only Used to Check Augmentation
# dataset = GF7Dataset(
#     image_dir = "data/GF-7 Building (3Bands)/Train/image",
#     mask_dir = "data/GF-7 Building (3Bands)/Train/label",
#     image_size=224, 
#     transform=lite_tranform_pipeline
# )

# loader = DataLoader(dataset, batch_size=8, shuffle=True)

# # Iterate over a batch
# for images, masks in loader:
#     print(images.shape)  # [B, 3, 224, 224]
#     print(masks.shape)   # [B, 1, 224, 224]
#     break

# print(f"Number of samples in dataset: {len(dataset)}")

# import matplotlib.pyplot as plt
# import numpy as np

# # Get a batch from your loader
# for images, masks in loader:
#     for i in range(min(4, images.shape[0])):
#         img = images[i].cpu().numpy()
#         mask = masks[i].cpu().numpy().squeeze(0)
#         img = np.transpose(img, (1, 2, 0))

#         print(f"Original - min: {img.min():.4f}, max: {img.max():.4f}, mean: {img.mean():.4f}")
        
#         # More robust check for normalization
#         if abs(img.mean()) > 0.5 or img.min() < -0.1:  # Likely normalized
#             print("Detected normalized image, unnormalizing...")
#             mean = np.array([0.485, 0.456, 0.406])
#             std = np.array([0.229, 0.224, 0.225])
#             img = (img * std) + mean
#             print(f"After unnorm - min: {img.min():.4f}, max: {img.max():.4f}")
            
#         img = np.clip(img, 0, 1)
        
#         plt.figure(figsize=(6, 3))
#         plt.subplot(1, 2, 1)
#         plt.imshow(img)
#         plt.title("Image")
#         plt.axis('off')
#         plt.subplot(1, 2, 2)
#         plt.imshow(mask, cmap='gray')
#         plt.title("Mask")
#         plt.axis('off')
#         plt.show()
#     break

ResNet¶

InĀ [9]:
def get_r50_b16_config():
    """Returns the Resnet50 + ViT-B/16 configuration."""
    config = get_b16_config()
    config.patches.grid = (16, 16)
    config.resnet = ml_collections.ConfigDict()
    config.resnet.num_layers = (3, 4, 9)
    config.resnet.width_factor = 1

    config.classifier = 'seg'
    config.pretrained_path = '../model/ViT-B_16.npz'
    config.decoder_channels = (256, 128, 64, 16)
    config.skip_channels = [512, 256, 64, 16]
    config.n_classes = 2
    config.n_skip = 3
    config.activation = 'softmax'

    return config
InĀ [10]:
from os.path import join as pjoin
from collections import OrderedDict

import torch
import torch.nn as nn
import torch.nn.functional as F

def np2th(weights, conv=False):
    """Possibly convert HWIO to OIHW."""
    if conv:
        weights = weights.transpose([3, 2, 0, 1])
    return torch.from_numpy(weights)

#standardize the weights before doing convolution
#weight -->(output_channel,input_channel,kernel_size[0],kernel_size[1])
#So compute mean and variance for each input_channel*kernel_size[0]*kernel_size[1]
class StdConv2d(nn.Conv2d):

    def forward(self, x):
        w = self.weight
        v, m = torch.var_mean(w, dim=[1, 2, 3], keepdim=True, unbiased=False)
        w = (w - m) / torch.sqrt(v + 1e-5)
        return F.conv2d(x, w, self.bias, self.stride, self.padding,
                        self.dilation, self.groups)

#do convolution using StdConv2d
def conv3x3(cin, cout, stride=1, groups=1, bias=False):
    return StdConv2d(cin, cout, kernel_size=3, stride=stride,
                     padding=1, bias=bias, groups=groups)


def conv1x1(cin, cout, stride=1, bias=False):
    return StdConv2d(cin, cout, kernel_size=1, stride=stride,
                     padding=0, bias=bias)

Pre-Activation Bottleneck Block¶

InĀ [11]:
class PreActBottleneck(nn.Module):
    """Pre-activation (v2) bottleneck block.
    """

    def __init__(self, cin, cout=None, cmid=None, stride=1):
        super().__init__()
        cout = cout or cin
        cmid = cmid or cout//4

        self.gn1 = nn.GroupNorm(32, cmid, eps=1e-6)
        self.conv1 = conv1x1(cin, cmid, bias=False)
        self.gn2 = nn.GroupNorm(32, cmid, eps=1e-6)
        self.conv2 = conv3x3(cmid, cmid, stride, bias=False)  # Original code has it on conv1!!
        self.gn3 = nn.GroupNorm(32, cout, eps=1e-6)
        self.conv3 = conv1x1(cmid, cout, bias=False)
        self.relu = nn.ReLU(inplace=True)

        if (stride != 1 or cin != cout):
            # Projection also with pre-activation according to paper.
            self.downsample = conv1x1(cin, cout, stride, bias=False)
            self.gn_proj = nn.GroupNorm(cout, cout)

    def forward(self, x):

        # Residual branch
        residual = x
        if hasattr(self, 'downsample'):
            residual = self.downsample(x)
            residual = self.gn_proj(residual)

        # Unit's branch
        y = self.relu(self.gn1(self.conv1(x)))
        y = self.relu(self.gn2(self.conv2(y)))
        y = self.gn3(self.conv3(y))

        y = self.relu(residual + y)
        return y

    def load_from(self, weights, n_block, n_unit):
        conv1_weight = np2th(weights[pjoin(n_block, n_unit, "conv1/kernel").replace('\\', '/')], conv=True)
        conv2_weight = np2th(weights[pjoin(n_block, n_unit, "conv2/kernel").replace('\\', '/')], conv=True)
        conv3_weight = np2th(weights[pjoin(n_block, n_unit, "conv3/kernel").replace('\\', '/')], conv=True)

        gn1_weight = np2th(weights[pjoin(n_block, n_unit, "gn1/scale").replace('\\', '/')])
        gn1_bias = np2th(weights[pjoin(n_block, n_unit, "gn1/bias").replace('\\', '/')])

        gn2_weight = np2th(weights[pjoin(n_block, n_unit, "gn2/scale").replace('\\', '/')])
        gn2_bias = np2th(weights[pjoin(n_block, n_unit, "gn2/bias").replace('\\', '/')])

        gn3_weight = np2th(weights[pjoin(n_block, n_unit, "gn3/scale").replace('\\', '/')])
        gn3_bias = np2th(weights[pjoin(n_block, n_unit, "gn3/bias").replace('\\', '/')])

        self.conv1.weight.copy_(conv1_weight)
        self.conv2.weight.copy_(conv2_weight)
        self.conv3.weight.copy_(conv3_weight)

        self.gn1.weight.copy_(gn1_weight.view(-1))
        self.gn1.bias.copy_(gn1_bias.view(-1))

        self.gn2.weight.copy_(gn2_weight.view(-1))
        self.gn2.bias.copy_(gn2_bias.view(-1))

        self.gn3.weight.copy_(gn3_weight.view(-1))
        self.gn3.bias.copy_(gn3_bias.view(-1))

        if hasattr(self, 'downsample'):
            proj_conv_weight = np2th(weights[pjoin(n_block, n_unit, "conv_proj/kernel").replace('\\', '/')], conv=True)
            proj_gn_weight = np2th(weights[pjoin(n_block, n_unit, "gn_proj/scale").replace('\\', '/')])
            proj_gn_bias = np2th(weights[pjoin(n_block, n_unit, "gn_proj/bias").replace('\\', '/')])

            self.downsample.weight.copy_(proj_conv_weight)
            self.gn_proj.weight.copy_(proj_gn_weight.view(-1))
            self.gn_proj.bias.copy_(proj_gn_bias.view(-1))

Resnet V2¶

InĀ [12]:
class ResNetV2(nn.Module):
    """Implementation of Pre-activation (v2) ResNet mode."""

    def __init__(self, block_units, width_factor):
        super().__init__()
        width = int(64 * width_factor)
        self.width = width

        self.root = nn.Sequential(OrderedDict([
            ('conv', StdConv2d(3, width, kernel_size=7, stride=2, bias=False, padding=3)),
            ('gn', nn.GroupNorm(32, width, eps=1e-6)),
            ('relu', nn.ReLU(inplace=True)),
            # ('pool', nn.MaxPool2d(kernel_size=3, stride=2, padding=0))
        ]))

        self.body = nn.Sequential(OrderedDict([
            ('block1', nn.Sequential(OrderedDict(
                [('unit1', PreActBottleneck(cin=width, cout=width*4, cmid=width))] +
                [(f'unit{i:d}', PreActBottleneck(cin=width*4, cout=width*4, cmid=width)) for i in range(2, block_units[0] + 1)],
                ))),
            ('block2', nn.Sequential(OrderedDict(
                [('unit1', PreActBottleneck(cin=width*4, cout=width*8, cmid=width*2, stride=2))] +
                [(f'unit{i:d}', PreActBottleneck(cin=width*8, cout=width*8, cmid=width*2)) for i in range(2, block_units[1] + 1)],
                ))),
            ('block3', nn.Sequential(OrderedDict(
                [('unit1', PreActBottleneck(cin=width*8, cout=width*16, cmid=width*4, stride=2))] +
                [(f'unit{i:d}', PreActBottleneck(cin=width*16, cout=width*16, cmid=width*4)) for i in range(2, block_units[2] + 1)],
                ))),
        ]))
    
    def forward(self, x):
        features = []
        b, c, in_size, _ = x.size()
        x = self.root(x)
        features.append(x)
        x = nn.MaxPool2d(kernel_size=3, stride=2, padding=0)(x)
        for i in range(len(self.body)-1):
            #According to paper, you have to concatenate the the output of resnet 
            #blocks with decoder part so you have to make sure that the height and 
            #width matches
            x = self.body[i](x)
            right_size = int(in_size / 4 / (i+1))
            if x.size()[2] != right_size:
                pad = right_size - x.size()[2]
                assert pad < 3 and pad > 0, "x {} should {}".format(x.size(), right_size)
                feat = torch.zeros((b, x.size()[1], right_size, right_size), device=x.device)
                feat[:, :, 0:x.size()[2], 0:x.size()[3]] = x[:]
            else:
                feat = x
            features.append(feat)
        x = self.body[-1](x)
        return x, features[::-1]
InĀ [13]:
# coding=utf-8
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import copy
import logging
import math
import ml_collections
from os.path import join as pjoin

import torch
import torch.nn as nn
import numpy as np

from torch.nn import CrossEntropyLoss, Dropout, Softmax, Linear, Conv2d, LayerNorm
from torch.nn.modules.utils import _pair
from scipy import ndimage
logger = logging.getLogger(__name__)
def get_b16_config():
    """Returns the ViT-B/16 configuration."""
    config = ml_collections.ConfigDict()
    config.patches = ml_collections.ConfigDict({'size': (16, 16)})
    config.hidden_size = 768
    config.transformer = ml_collections.ConfigDict()
    config.transformer.mlp_dim = 3072
    config.transformer.num_heads = 12
    config.transformer.num_layers = 12
    config.transformer.attention_dropout_rate = 0.0
    config.transformer.dropout_rate = 0.1

    config.classifier = 'seg'
    config.representation_size = None
    config.resnet_pretrained_path = None
    config.pretrained_path = '../model/vit_checkpoint/imagenet21k/ViT-B_16.npz'
    config.patch_size = 16

    config.decoder_channels = (256, 128, 64, 16)
    config.n_classes = 2
    config.activation = 'softmax'
    return config

def get_r50_b16_config():
    """Returns the Resnet50 + ViT-B/16 configuration."""
    config = get_b16_config()
    config.patches.grid = (16, 16)
    config.resnet = ml_collections.ConfigDict()
    config.resnet.num_layers = (3, 4, 9)
    config.resnet.width_factor = 1

    config.classifier = 'seg'
    config.pretrained_path = '../model/vit_checkpoint/imagenet21k/R50+ViT-B_16.npz'
    config.decoder_channels = (256, 128, 64, 16)
    config.skip_channels = [512, 256, 64, 16]
    config.n_classes = 2
    config.n_skip = 3
    config.activation = 'softmax'

    return config
CONFIGS = {
    'ViT-B_16': get_b16_config(),
    'R50-ViT-B_16': get_r50_b16_config(),

}

Vision Transformer (ViT)¶

Embeding Layer¶

InĀ [14]:
class Embeddings(nn.Module):
    """Construct the embeddings from patch, position embeddings.
    """
    def __init__(self, config, img_size, in_channels=3):
        super(Embeddings, self).__init__()
        self.hybrid = None
        self.config = config
        img_size = _pair(img_size)
        #print(config.patches.get("grid"))
        #print(img_size)
        if config.patches.get("grid") is not None:   # ResNet
            grid_size = config.patches["grid"]
            #print(grid_size)
            patch_size = (img_size[0] // 16 // grid_size[0], img_size[1] // 16 // grid_size[1])
            patch_size_real = (patch_size[0] * 16, patch_size[1] * 16)
            #print(patch_size,patch_size_real)
            n_patches = (img_size[0] // patch_size_real[0]) * (img_size[1] // patch_size_real[1])  
            
            self.hybrid = True
        else:
            patch_size = _pair(config.patches["size"])
            n_patches = (img_size[0] // patch_size[0]) * (img_size[1] // patch_size[1])
            self.hybrid = False

        if self.hybrid:
            self.hybrid_model = ResNetV2(block_units=config.resnet.num_layers, width_factor=config.resnet.width_factor)
            in_channels = self.hybrid_model.width * 16
        self.patch_embeddings = Conv2d(in_channels=in_channels,
                                       out_channels=config.hidden_size,
                                       kernel_size=patch_size,
                                       stride=patch_size)
        self.position_embeddings = nn.Parameter(torch.zeros(1, n_patches, config.hidden_size))

        self.dropout = Dropout(config.transformer["dropout_rate"])


    def forward(self, x):
        if self.hybrid:
            x, features = self.hybrid_model(x)
        else:
            features = None
        x = self.patch_embeddings(x)  # (B, hidden. n_patches^(1/2), n_patches^(1/2))
        x = x.flatten(2)
        x = x.transpose(-1, -2)  # (B, n_patches, hidden)

        embeddings = x + self.position_embeddings
        embeddings = self.dropout(embeddings)
        return embeddings, features

Attention¶

InĀ [15]:
ATTENTION_Q = "MultiHeadDotProductAttention_1/query"
ATTENTION_K = "MultiHeadDotProductAttention_1/key"
ATTENTION_V = "MultiHeadDotProductAttention_1/value"
ATTENTION_OUT = "MultiHeadDotProductAttention_1/out"
FC_0 = "MlpBlock_3/Dense_0"
FC_1 = "MlpBlock_3/Dense_1"
ATTENTION_NORM = "LayerNorm_0"
MLP_NORM = "LayerNorm_2"


def np2th(weights, conv=False):
    """Possibly convert HWIO to OIHW."""
    if conv:
        weights = weights.transpose([3, 2, 0, 1])
    return torch.from_numpy(weights)


def swish(x):
    return x * torch.sigmoid(x)


ACT2FN = {"gelu": torch.nn.functional.gelu, "relu": torch.nn.functional.relu, "swish": swish}


class Attention(nn.Module):
    def __init__(self, config, vis):
        super(Attention, self).__init__()
        self.vis = vis
        self.num_attention_heads = config.transformer["num_heads"]
        self.attention_head_size = int(config.hidden_size / self.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.query = Linear(config.hidden_size, self.all_head_size)
        self.key = Linear(config.hidden_size, self.all_head_size)
        self.value = Linear(config.hidden_size, self.all_head_size)

        self.out = Linear(config.hidden_size, config.hidden_size)
        self.attn_dropout = Dropout(config.transformer["attention_dropout_rate"])
        self.proj_dropout = Dropout(config.transformer["attention_dropout_rate"])

        self.softmax = Softmax(dim=-1)

    def transpose_for_scores(self, x):
        new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(*new_x_shape)
        return x.permute(0, 2, 1, 3)

    def forward(self, hidden_states):
        mixed_query_layer = self.query(hidden_states)
        mixed_key_layer = self.key(hidden_states)
        mixed_value_layer = self.value(hidden_states)

        query_layer = self.transpose_for_scores(mixed_query_layer)
        key_layer = self.transpose_for_scores(mixed_key_layer)
        value_layer = self.transpose_for_scores(mixed_value_layer)

        attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
        attention_probs = self.softmax(attention_scores)
        weights = attention_probs if self.vis else None
        attention_probs = self.attn_dropout(attention_probs)

        context_layer = torch.matmul(attention_probs, value_layer)
        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
        context_layer = context_layer.view(*new_context_layer_shape)
        attention_output = self.out(context_layer)
        attention_output = self.proj_dropout(attention_output)
        return attention_output, weights

Multilayer Perceptron¶

InĀ [16]:
class Mlp(nn.Module):
    def __init__(self, config):
        super(Mlp, self).__init__()
        self.fc1 = Linear(config.hidden_size, config.transformer["mlp_dim"])
        self.fc2 = Linear(config.transformer["mlp_dim"], config.hidden_size)
        self.act_fn = ACT2FN["gelu"]
        self.dropout = Dropout(config.transformer["dropout_rate"])

        self._init_weights()

    def _init_weights(self):
        nn.init.xavier_uniform_(self.fc1.weight)
        nn.init.xavier_uniform_(self.fc2.weight)
        nn.init.normal_(self.fc1.bias, std=1e-6)
        nn.init.normal_(self.fc2.bias, std=1e-6)

    def forward(self, x):
        x = self.fc1(x)
        x = self.act_fn(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.dropout(x)
        return x

Transformer block¶

InĀ [17]:
class Block(nn.Module):
    def __init__(self, config, vis):
        super(Block, self).__init__()
        self.hidden_size = config.hidden_size
        self.attention_norm = LayerNorm(config.hidden_size, eps=1e-6)
        self.ffn_norm = LayerNorm(config.hidden_size, eps=1e-6)
        self.ffn = Mlp(config)
        self.attn = Attention(config, vis)

    def forward(self, x):
        h = x
        x = self.attention_norm(x)
        x, weights = self.attn(x)
        x = x + h

        h = x
        x = self.ffn_norm(x)
        x = self.ffn(x)
        x = x + h
        return x, weights

    def load_from(self, weights, n_block):
        ROOT = f"Transformer/encoderblock_{n_block}"
        with torch.no_grad():
            Temp=weights
            query_weight = np2th(weights[pjoin(ROOT,ATTENTION_Q,"kernel").replace('\\', '/')]).view(self.hidden_size, self.hidden_size).t()

            key_weight = np2th(weights[pjoin(ROOT, ATTENTION_K, "kernel").replace('\\', '/')]).view(self.hidden_size, self.hidden_size).t()
            value_weight = np2th(weights[pjoin(ROOT, ATTENTION_V, "kernel").replace('\\', '/')]).view(self.hidden_size, self.hidden_size).t()
            out_weight = np2th(weights[pjoin(ROOT, ATTENTION_OUT, "kernel").replace('\\', '/')]).view(self.hidden_size, self.hidden_size).t()

            query_bias = np2th(weights[pjoin(ROOT, ATTENTION_Q, "bias").replace('\\', '/')]).view(-1)
            key_bias = np2th(weights[pjoin(ROOT, ATTENTION_K, "bias").replace('\\', '/')]).view(-1)
            value_bias = np2th(weights[pjoin(ROOT, ATTENTION_V, "bias").replace('\\', '/')]).view(-1)
            out_bias = np2th(weights[pjoin(ROOT, ATTENTION_OUT, "bias").replace('\\', '/')]).view(-1)

            self.attn.query.weight.copy_(query_weight)
            self.attn.key.weight.copy_(key_weight)
            self.attn.value.weight.copy_(value_weight)
            self.attn.out.weight.copy_(out_weight)
            self.attn.query.bias.copy_(query_bias)
            self.attn.key.bias.copy_(key_bias)
            self.attn.value.bias.copy_(value_bias)
            self.attn.out.bias.copy_(out_bias)

            mlp_weight_0 = np2th(weights[pjoin(ROOT, FC_0, "kernel").replace('\\', '/')]).t()
            mlp_weight_1 = np2th(weights[pjoin(ROOT, FC_1, "kernel").replace('\\', '/')]).t()
            mlp_bias_0 = np2th(weights[pjoin(ROOT, FC_0, "bias").replace('\\', '/')]).t()
            mlp_bias_1 = np2th(weights[pjoin(ROOT, FC_1, "bias").replace('\\', '/')]).t()

            self.ffn.fc1.weight.copy_(mlp_weight_0)
            self.ffn.fc2.weight.copy_(mlp_weight_1)
            self.ffn.fc1.bias.copy_(mlp_bias_0)
            self.ffn.fc2.bias.copy_(mlp_bias_1)

            self.attention_norm.weight.copy_(np2th(weights[pjoin(ROOT, ATTENTION_NORM, "scale").replace('\\', '/')]))
            self.attention_norm.bias.copy_(np2th(weights[pjoin(ROOT, ATTENTION_NORM, "bias").replace('\\', '/')]))
            self.ffn_norm.weight.copy_(np2th(weights[pjoin(ROOT, MLP_NORM, "scale").replace('\\', '/')]))
            self.ffn_norm.bias.copy_(np2th(weights[pjoin(ROOT, MLP_NORM, "bias").replace('\\', '/')]))

Encoder¶

InĀ [18]:
class Encoder(nn.Module):
    def __init__(self, config, vis):
        super(Encoder, self).__init__()
        self.vis = vis
        self.layer = nn.ModuleList()
        self.encoder_norm = LayerNorm(config.hidden_size, eps=1e-6)
        for _ in range(config.transformer["num_layers"]):
            layer = Block(config, vis)
            self.layer.append(copy.deepcopy(layer))

    def forward(self, hidden_states):


        attn_weights = []
        for layer_block in self.layer:
            hidden_states, weights = layer_block(hidden_states)
            if self.vis:
                attn_weights.append(weights)
        encoded = self.encoder_norm(hidden_states)
        return encoded, attn_weights

Transformer¶

InĀ [19]:
class Transformer(nn.Module):
    def __init__(self, config, img_size, vis):
        super(Transformer, self).__init__()
        self.embeddings = Embeddings(config, img_size=img_size)
        self.encoder = Encoder(config, vis)

    def forward(self, input_ids):
        embedding_output, features = self.embeddings(input_ids)
        encoded, attn_weights = self.encoder(embedding_output)  # (B, n_patch, hidden)
        return encoded, attn_weights, features

Decoder¶

InĀ [20]:
class Conv2dReLU(nn.Sequential):
    def __init__(
            self,
            in_channels,
            out_channels,
            kernel_size,
            padding=0,
            stride=1,
            use_batchnorm=True,
    ):
        conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size,
            stride=stride,
            padding=padding,
            bias=not (use_batchnorm),
        )
        relu = nn.ReLU(inplace=True)

        bn = nn.BatchNorm2d(out_channels)

        super(Conv2dReLU, self).__init__(conv, bn, relu)
InĀ [21]:
class DecoderBlock(nn.Module):
    def __init__(
            self,
            in_channels,
            out_channels,
            skip_channels=0,
            use_batchnorm=True,
    ):
        super().__init__()
        self.conv1 = Conv2dReLU(
            in_channels + skip_channels,
            out_channels,
            kernel_size=3,
            padding=1,
            use_batchnorm=use_batchnorm,
        )
        self.conv2 = Conv2dReLU(
            out_channels,
            out_channels,
            kernel_size=3,
            padding=1,
            use_batchnorm=use_batchnorm,
        )
        self.up = nn.UpsamplingBilinear2d(scale_factor=2)

    def forward(self, x, skip=None):
        x = self.up(x)
        if skip is not None:
            x = torch.cat([x, skip], dim=1)
        x = self.conv1(x)
        x = self.conv2(x)
        return x
InĀ [22]:
class DecoderCup(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        head_channels = 512
        self.conv_more = Conv2dReLU(
            config.hidden_size,
            head_channels,
            kernel_size=3,
            padding=1,
            use_batchnorm=True,
        )
        decoder_channels = config.decoder_channels
        in_channels = [head_channels] + list(decoder_channels[:-1])
        out_channels = decoder_channels

        if self.config.n_skip != 0:
            skip_channels = self.config.skip_channels
            for i in range(4-self.config.n_skip):  # re-select the skip channels according to n_skip
                skip_channels[3-i]=0

        else:
            skip_channels=[0,0,0,0]

        blocks = [
            DecoderBlock(in_ch, out_ch, sk_ch) for in_ch, out_ch, sk_ch in zip(in_channels, out_channels, skip_channels)
        ]
        self.blocks = nn.ModuleList(blocks)

    def forward(self, hidden_states, features=None):
        B, n_patch, hidden = hidden_states.size()  # reshape from (B, n_patch, hidden) to (B, h, w, hidden)
        h, w = int(np.sqrt(n_patch)), int(np.sqrt(n_patch))
        x = hidden_states.permute(0, 2, 1)
        x = x.contiguous().view(B, hidden, h, w)
        x = self.conv_more(x)
        for i, decoder_block in enumerate(self.blocks):
            if features is not None:
                skip = features[i] if (i < self.config.n_skip) else None
            else:
                skip = None
            x = decoder_block(x, skip=skip)
        return x

Segmentation Head¶

InĀ [23]:
class SegmentationHead(nn.Sequential):

    def __init__(self, in_channels, out_channels, kernel_size=3, upsampling=1):
        conv2d = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, padding=kernel_size // 2)
        upsampling = nn.UpsamplingBilinear2d(scale_factor=upsampling) if upsampling > 1 else nn.Identity()
        super().__init__(conv2d, upsampling)

Vision Tranformer Class¶

InĀ [24]:
class VisionTransformer(nn.Module):
    def __init__(self, config, img_size=224, num_classes=21843, zero_head=False, vis=False):
        super(VisionTransformer, self).__init__()
        self.num_classes = num_classes
        self.zero_head = zero_head
        self.classifier = config.classifier
        self.transformer = Transformer(config, img_size, vis)
        self.decoder = DecoderCup(config)
        self.segmentation_head = SegmentationHead(
            in_channels=config['decoder_channels'][-1],
            out_channels=config['n_classes'],
            kernel_size=3,
        )
        self.config = config

    def forward(self, x):
        if x.size()[1] == 1:
            x = x.repeat(1,3,1,1)
        x, attn_weights, features = self.transformer(x)  # (B, n_patch, hidden)
        x = self.decoder(x, features)
        logits = self.segmentation_head(x)
        return logits

    def load_from(self, weights):
        with torch.no_grad():

            res_weight = weights
            self.transformer.embeddings.patch_embeddings.weight.copy_(np2th(weights["embedding/kernel"], conv=True))
            self.transformer.embeddings.patch_embeddings.bias.copy_(np2th(weights["embedding/bias"]))

            self.transformer.encoder.encoder_norm.weight.copy_(np2th(weights["Transformer/encoder_norm/scale"]))
            self.transformer.encoder.encoder_norm.bias.copy_(np2th(weights["Transformer/encoder_norm/bias"]))

            posemb = np2th(weights["Transformer/posembed_input/pos_embedding"])

            posemb_new = self.transformer.embeddings.position_embeddings
            if posemb.size() == posemb_new.size():
                self.transformer.embeddings.position_embeddings.copy_(posemb)
            elif posemb.size()[1]-1 == posemb_new.size()[1]:
                posemb = posemb[:, 1:]
                self.transformer.embeddings.position_embeddings.copy_(posemb)
            else:
                logger.info("load_pretrained: resized variant: %s to %s" % (posemb.size(), posemb_new.size()))
                ntok_new = posemb_new.size(1)
                if self.classifier == "seg":
                    _, posemb_grid = posemb[:, :1], posemb[0, 1:]
                gs_old = int(np.sqrt(len(posemb_grid)))
                gs_new = int(np.sqrt(ntok_new))
                print('load_pretrained: grid-size from %s to %s' % (gs_old, gs_new))
                posemb_grid = posemb_grid.reshape(gs_old, gs_old, -1)
                zoom = (gs_new / gs_old, gs_new / gs_old, 1)
                posemb_grid = ndimage.zoom(posemb_grid, zoom, order=1)  # th2np
                posemb_grid = posemb_grid.reshape(1, gs_new * gs_new, -1)
                posemb = posemb_grid
                self.transformer.embeddings.position_embeddings.copy_(np2th(posemb))

            # Encoder whole
            for bname, block in self.transformer.encoder.named_children():
                for uname, unit in block.named_children():
                    unit.load_from(weights, n_block=uname)

            if self.transformer.embeddings.hybrid:
                self.transformer.embeddings.hybrid_model.root.conv.weight.copy_(np2th(res_weight["conv_root/kernel"], conv=True))
                gn_weight = np2th(res_weight["gn_root/scale"]).view(-1)
                gn_bias = np2th(res_weight["gn_root/bias"]).view(-1)
                self.transformer.embeddings.hybrid_model.root.gn.weight.copy_(gn_weight)
                self.transformer.embeddings.hybrid_model.root.gn.bias.copy_(gn_bias)

                for bname, block in self.transformer.embeddings.hybrid_model.body.named_children():
                    for uname, unit in block.named_children():
                        unit.load_from(res_weight, n_block=bname, n_unit=uname)

Loss Function¶

InĀ [25]:
class DiceLoss(nn.Module):
    def __init__(self, n_classes):
        super(DiceLoss, self).__init__()
        self.n_classes = n_classes

    def _one_hot_encoder(self, input_tensor):
        tensor_list = []
        for i in range(self.n_classes):
            temp_prob = input_tensor == i  # * torch.ones_like(input_tensor)
            tensor_list.append(temp_prob.unsqueeze(1))
        output_tensor = torch.cat(tensor_list, dim=1)
        return output_tensor.float()

    def _dice_loss(self, score, target):
        target = target.float()
        smooth = 1e-5
        intersect = torch.sum(score * target)
        y_sum = torch.sum(target * target)
        z_sum = torch.sum(score * score)
        loss = (2 * intersect + smooth) / (z_sum + y_sum + smooth)
        loss = 1 - loss
        return loss

    def forward(self, inputs, target, weight=None, softmax=False):
        if softmax:
            inputs = torch.softmax(inputs, dim=1)
        target = self._one_hot_encoder(target)
        if weight is None:
            weight = [1] * self.n_classes
        assert inputs.size() == target.size(), 'predict {} & target {} shape do not match'.format(inputs.size(), target.size())
        class_wise_dice = []
        loss = 0.0
        for i in range(0, self.n_classes):
            dice = self._dice_loss(inputs[:, i], target[:, i])
            class_wise_dice.append(1.0 - dice.item())
            loss += dice * weight[i]
        return loss / self.n_classes

Training¶

InĀ [26]:
# Libaries Used for Training
import argparse
import logging
import os
import random
import sys
import time
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from tensorboardX import SummaryWriter
from torch.nn.modules.loss import CrossEntropyLoss
from torch.utils.data import DataLoader
import torch.backends.cudnn as cudnn
from tqdm import tqdm
from torchvision import transforms

import time
from tqdm.notebook import tqdm 

Trainer¶

InĀ [27]:
def trainer_synapse(args, model, snapshot_path):

    logging.basicConfig(filename=snapshot_path + "/log.txt", level=logging.INFO,
                        format='[%(asctime)s.%(msecs)03d] %(message)s', datefmt='%H:%M:%S')
    logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))
    logging.info(str(args))

    base_lr = args.base_lr
    num_classes = args.num_classes
    batch_size = args.batch_size * args.n_gpu

    # Use GF7Dataset
    db_train = GF7Dataset(
        image_dir=args.image_dir,
        mask_dir=args.mask_dir,
        image_size=args.img_size,
        transform= tranform_pipline   # Add Albumentations if needed
    )

    print("The length of train set is: {}".format(len(db_train)))

    def worker_init_fn(worker_id):
        random.seed(args.seed + worker_id)

    trainloader = DataLoader(
        db_train,
        batch_size=batch_size,
        shuffle=True,
        num_workers=0,
        pin_memory=True,
        worker_init_fn=worker_init_fn
    )

    if args.n_gpu > 1:
        model = nn.DataParallel(model)

    model.train()
    ce_loss = CrossEntropyLoss()
    dice_loss = DiceLoss(num_classes)
    optimizer = optim.AdamW(model.parameters(), lr=base_lr, weight_decay=0.0001, amsgrad=True) # Changed to AdamW (Adam decreased training time by almost 50%) added Armsgrad
    
    max_epoch = args.max_epochs
    max_iterations = max_epoch * len(trainloader)
    
    # Num restarts over full training
    num_restarts = 3
    
    # Compute adaptive T_0 based on total iterations
    T_0 = max_iterations // num_restarts
    
    scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(
        optimizer,
        T_0=T_0,
        T_mult=1,           # keep restarts evenly spaced
        eta_min=1e-6        # stable minimum learning rate for Adam
    )
        
    writer = SummaryWriter(snapshot_path + '/log')

    iter_num = 0
    logging.info("{} iterations per epoch. {} max iterations ".format(len(trainloader), max_iterations))

    start_time = time.time()
    
    total_loss_sum = 0.0
    total_loss_count = 0
    
    conditional_saves_count = 0
    max_conditional_saves = 5
    
    iterator = tqdm(range(max_epoch), ncols=500, desc="Epoch", leave=False)  # leave=False to avoid cluttering the output
    for epoch_num in iterator:
        for i_batch, (image_batch, label_batch) in enumerate(trainloader):  # Tuple unpacking
            image_batch, label_batch = image_batch.cuda(), label_batch.cuda()

            outputs = model(image_batch)
            loss_ce = ce_loss(outputs, label_batch.long().squeeze(1))  # squeeze if shape is [B, 1, H, W]
            loss_dice = dice_loss(outputs, label_batch.squeeze(1), softmax=True)
            loss = 0.5 * loss_ce + 0.5 * loss_dice

            if iter_num > max_iterations - 10:
                total_loss_sum += loss.item()
                total_loss_count += 1

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            # Step the scheduler (use epoch + batch/len(trainloader) for smooth schedule)
            scheduler.step(epoch_num + i_batch / len(trainloader))

            # Retired learning rate decay, now using 
            #lr_ = base_lr * (1.0 - iter_num / max_iterations) ** 0.9 # Polynomial decay of Learning Rate, leftover from original code with SGD however still works with AdamW, sure that every update step doesn't exceed lambda 
             
            # for param_group in optimizer.param_groups:
            #     param_group['lr'] = lr_

            iter_num += 1
            current_lr = optimizer.param_groups[0]['lr']
            writer.add_scalar('info/lr', current_lr, iter_num)
            writer.add_scalar('info/total_loss', loss, iter_num)
            writer.add_scalar('info/loss_ce', loss_ce, iter_num)

            #logging.info('iteration %d : loss : %f, loss_ce: %f' % (iter_num, loss.item(), loss_ce.item()))

            # Optionally, only log every N iterations
            if iter_num % 10 == 0:
                logging.info('iteration %d : loss : %f, loss_ce: %f' % (iter_num, loss.item(), loss_ce.item()))
                
            if iter_num > max_iterations - 10:
                logging.info('iteration %d : loss : %f, loss_ce: %f' % (iter_num, loss.item(), loss_ce.item()))

            if iter_num % 20 == 0:
                image = image_batch[1, 0:1, :, :]
                image = (image - image.min()) / (image.max() - image.min())
                writer.add_image('train/Image', image, iter_num)
                outputs_vis = torch.argmax(torch.softmax(outputs, dim=1), dim=1, keepdim=True)
                writer.add_image('train/Prediction', outputs_vis[1, ...] * 50, iter_num)
                labs = label_batch[1, ...] * 50  # Remove .unsqueeze(0)
                writer.add_image('train/GroundTruth', labs, iter_num)

        save_interval = 50
        if epoch_num > int(max_epoch / 2) and (epoch_num + 1) % save_interval == 0:
            save_mode_path = os.path.join(snapshot_path, f'epoch_{epoch_num}.pth')
            torch.save(model.state_dict(), save_mode_path)
            logging.info(f"save model to {save_mode_path}")

        save_interval_2 = 5 # Changes back To 10 After
        if epoch_num > 48 and (epoch_num + 1) % save_interval_2 == 0:
            save_mode_path = os.path.join(snapshot_path, f'epoch_{epoch_num}_iter_{iter_num}.pth')
            torch.save(model.state_dict(), save_mode_path)
            logging.info(f"save model to {save_mode_path}")
            
        # If CE Loss Less Than 0.06 Save Model (Limited to 5 saves)
        try:
            if loss_ce.item() < 0.06 and conditional_saves_count < max_conditional_saves:
                save_mode_path = os.path.join(snapshot_path, f'LOW_CE_epoch_{epoch_num}_iter_{iter_num}_loss_{loss_ce.item():.4f}.pth')
                torch.save(model.state_dict(), save_mode_path)
                logging.info(f"save model to {save_mode_path} with loss {loss_ce.item():.4f}")
                conditional_saves_count += 1
                logging.info(f"Conditional saves: {conditional_saves_count}/{max_conditional_saves}")
        except Exception as e:
            logging.warning(f"Failed to save model at epoch {epoch_num}, iter {iter_num}: {e}")
            # Continue training without interruption
            

        if epoch_num >= max_epoch - 1:
            save_mode_path = os.path.join(snapshot_path, f'epoch_{epoch_num}.pth')
            torch.save(model.state_dict(), save_mode_path)
            logging.info(f"save model to {save_mode_path}")
            iterator.close()
            break

    writer.close()
    
     # Calculate and print total time and average seconds per iteration
    total_time = time.time() - start_time
    avg_time_per_iter = total_time / iter_num if iter_num > 0 else 0
    avg_loss = total_loss_sum / total_loss_count if total_loss_count > 0 else 0

    print("------Training Stats------")
    print(f"Training finished in {total_time:.2f} seconds ({total_time/60:.2f} minutes).")
    print(f"Average time per iteration: {avg_time_per_iter:.2f}s/it")
    print(f"Average loss: {avg_loss:.4f}")

    return "Training Finished!"

Training Arguments¶

InĀ [28]:
import argparse

parser = argparse.ArgumentParser()

# Original args (unchanged)
parser.add_argument('--dataset', type=str, default='GF7')
parser.add_argument('--num_classes', type=int, default=2)
parser.add_argument('--max_iterations', type=int, default=30000)
parser.add_argument('--max_epochs', type=int, default=8)
parser.add_argument('--batch_size', type=int, default=8)
parser.add_argument('--n_gpu', type=int, default=1)
parser.add_argument('--deterministic', type=int, default=1) # Make it 1 for reproducibility
parser.add_argument('--base_lr', type=float, default=0.001)
parser.add_argument('--img_size', type=int, default=224)
parser.add_argument('--seed', type=int, default=42)
parser.add_argument('--n_skip', type=int, default=3)
parser.add_argument('--vit_name', type=str, default='R50-ViT-B_16')
parser.add_argument('--vit_patches_size', type=int, default=16)

# Add these two for GF7Dataset
parser.add_argument('--image_dir', type=str, help='Path to satellite images')
parser.add_argument('--mask_dir', type=str, help='Path to segmentation masks')

# set Epochs common variable and convert to string
epc = '163'

# Parse args manually for notebook
args = parser.parse_args(args=[
    '--dataset', 'GF7',
    '--num_classes', '2',
    '--max_epochs', epc, # Should be 20 = 30 min and 50 = 1 hour
    '--batch_size', '25', # This Was Optized for my Computer & Was changed From 20 to 25 
    '--n_gpu', '1',
    '--base_lr', '0.001', # Changed from 0.001 to 0.0001 for better stability 
    '--img_size', '224',
    '--seed', '42',
    '--n_skip', '3',
    '--vit_name', 'R50-ViT-B_16',
    '--vit_patches_size', '16',
    '--image_dir', 'data/GF-7 Building (3Bands)/Train/image', # Change this Back    
    '--mask_dir', 'data/GF-7 Building (3Bands)/Train/label' # Change This Back
])

print(args)
Namespace(dataset='GF7', num_classes=2, max_iterations=30000, max_epochs=163, batch_size=25, n_gpu=1, deterministic=1, base_lr=0.001, img_size=224, seed=42, n_skip=3, vit_name='R50-ViT-B_16', vit_patches_size=16, image_dir='data/GF-7 Building (3Bands)/Train/image', mask_dir='data/GF-7 Building (3Bands)/Train/label')
InĀ [29]:
# -----------------------
# Environment Setup
# -----------------------
import os
import random
import numpy as np
import torch
from torch.backends import cudnn

if not args.deterministic:
    cudnn.benchmark = True
    cudnn.deterministic = False
else:
    cudnn.benchmark = False
    cudnn.deterministic = True

random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)

# -----------------------
# Dataset Configuration
# -----------------------
dataset_name = 'GF7'
dataset_config = {
    'GF7': {
        'image_dir': args.image_dir,
        'mask_dir': args.mask_dir,
        'num_classes': 2
    }
}

if args.batch_size != 24 and args.batch_size % 6 == 0:
    args.base_lr *= args.batch_size / 24

args.dataset = dataset_name
args.num_classes = dataset_config[dataset_name]['num_classes']
args.image_dir = dataset_config[dataset_name]['image_dir']
args.mask_dir = dataset_config[dataset_name]['mask_dir']
args.is_pretrain = True

# -----------------------
# Snapshot Path
# -----------------------
args.exp = f'TU_{dataset_name}{args.img_size}'
snapshot_path = f"model/{args.exp}/TU"
snapshot_path += '_pretrain' if args.is_pretrain else ''
snapshot_path += f"_{args.vit_name}_skip{args.n_skip}"
if args.vit_patches_size != 16:
    snapshot_path += f"_vitpatch{args.vit_patches_size}"
if args.max_iterations != 30000:
    snapshot_path += f"_{str(args.max_iterations)[:2]}k"
if args.max_epochs != 30:
    snapshot_path += f"_epo{args.max_epochs}"
snapshot_path += f"_bs{args.batch_size}"
if args.base_lr != 0.01:
    snapshot_path += f"_lr{args.base_lr}"
snapshot_path += f"_{args.img_size}"
if args.seed != 1234:
    snapshot_path += f"_s{args.seed}"

# Create snapshot directory
if not os.path.exists(snapshot_path):
    os.makedirs(snapshot_path)

# -----------------------
# ViT Config and Model
# -----------------------
#Assumes CONFIGS and VisionTransformer were already defined in earlier cells
config_vit = CONFIGS[args.vit_name]
config_vit.n_classes = args.num_classes
config_vit.n_skip = args.n_skip
config_vit.patches.size = (args.vit_patches_size, args.vit_patches_size)

if 'R50' in args.vit_name:
    grid_size = int(args.img_size / args.vit_patches_size)
    config_vit.patches.grid = (grid_size, grid_size)

# Build model
net = VisionTransformer(config_vit, img_size=args.img_size, num_classes=config_vit.n_classes).cuda()

# -----------------------
# Call Trainer
# -----------------------

to_train = 1

if to_train == 1:
    trainer = {'GF7': trainer_synapse}
    trainer[dataset_name](args, net, snapshot_path)
else:
    print('Training Disabled in this notebook. Uncomment the last line to train the model.')
Namespace(dataset='GF7', num_classes=2, max_iterations=30000, max_epochs=163, batch_size=25, n_gpu=1, deterministic=1, base_lr=0.001, img_size=224, seed=42, n_skip=3, vit_name='R50-ViT-B_16', vit_patches_size=16, image_dir='data/GF-7 Building (3Bands)/Train/image', mask_dir='data/GF-7 Building (3Bands)/Train/label', is_pretrain=True, exp='TU_GF7224')
The length of train set is: 3106
125 iterations per epoch. 20375 max iterations 
Epoch:   0%|                                                                                                  …
iteration 10 : loss : 0.336394, loss_ce: 0.381616
iteration 20 : loss : 0.377212, loss_ce: 0.453101
iteration 30 : loss : 0.313751, loss_ce: 0.361445
iteration 40 : loss : 0.301531, loss_ce: 0.346111
iteration 50 : loss : 0.283308, loss_ce: 0.318545
iteration 60 : loss : 0.283654, loss_ce: 0.324121
iteration 70 : loss : 0.265691, loss_ce: 0.303735
iteration 80 : loss : 0.314381, loss_ce: 0.377045
iteration 90 : loss : 0.282240, loss_ce: 0.337154
iteration 100 : loss : 0.279577, loss_ce: 0.336668
iteration 110 : loss : 0.270506, loss_ce: 0.320831
iteration 120 : loss : 0.242190, loss_ce: 0.287483
iteration 130 : loss : 0.238785, loss_ce: 0.271927
iteration 140 : loss : 0.238487, loss_ce: 0.275511
iteration 150 : loss : 0.258746, loss_ce: 0.296714
iteration 160 : loss : 0.302955, loss_ce: 0.360846
iteration 170 : loss : 0.259516, loss_ce: 0.306980
iteration 180 : loss : 0.213133, loss_ce: 0.244600
iteration 190 : loss : 0.233652, loss_ce: 0.271762
iteration 200 : loss : 0.312271, loss_ce: 0.382965
iteration 210 : loss : 0.262339, loss_ce: 0.313153
iteration 220 : loss : 0.217594, loss_ce: 0.244657
iteration 230 : loss : 0.270938, loss_ce: 0.315713
iteration 240 : loss : 0.195922, loss_ce: 0.230183
iteration 250 : loss : 0.363110, loss_ce: 0.463339
iteration 260 : loss : 0.234748, loss_ce: 0.258183
iteration 270 : loss : 0.203454, loss_ce: 0.239257
iteration 280 : loss : 0.260863, loss_ce: 0.310662
iteration 290 : loss : 0.207386, loss_ce: 0.254807
iteration 300 : loss : 0.234931, loss_ce: 0.289112
iteration 310 : loss : 0.225256, loss_ce: 0.259783
iteration 320 : loss : 0.238669, loss_ce: 0.275243
iteration 330 : loss : 0.199239, loss_ce: 0.239285
iteration 340 : loss : 0.200147, loss_ce: 0.235756
iteration 350 : loss : 0.199007, loss_ce: 0.229110
iteration 360 : loss : 0.237030, loss_ce: 0.287460
iteration 370 : loss : 0.202248, loss_ce: 0.244368
iteration 380 : loss : 0.215112, loss_ce: 0.238273
iteration 390 : loss : 0.258419, loss_ce: 0.322606
iteration 400 : loss : 0.170827, loss_ce: 0.190041
iteration 410 : loss : 0.216367, loss_ce: 0.258295
iteration 420 : loss : 0.213083, loss_ce: 0.241252
iteration 430 : loss : 0.224040, loss_ce: 0.275907
iteration 440 : loss : 0.207150, loss_ce: 0.247432
iteration 450 : loss : 0.271496, loss_ce: 0.342697
iteration 460 : loss : 0.209615, loss_ce: 0.242132
iteration 470 : loss : 0.198297, loss_ce: 0.228929
iteration 480 : loss : 0.229728, loss_ce: 0.272525
iteration 490 : loss : 0.174491, loss_ce: 0.200004
iteration 500 : loss : 0.141818, loss_ce: 0.167250
iteration 510 : loss : 0.172623, loss_ce: 0.202727
iteration 520 : loss : 0.223784, loss_ce: 0.276039
iteration 530 : loss : 0.202529, loss_ce: 0.234023
iteration 540 : loss : 0.239584, loss_ce: 0.247712
iteration 550 : loss : 0.196162, loss_ce: 0.234639
iteration 560 : loss : 0.211497, loss_ce: 0.255601
iteration 570 : loss : 0.263588, loss_ce: 0.337072
iteration 580 : loss : 0.204987, loss_ce: 0.222094
iteration 590 : loss : 0.213874, loss_ce: 0.251177
iteration 600 : loss : 0.232156, loss_ce: 0.287476
iteration 610 : loss : 0.230212, loss_ce: 0.292716
iteration 620 : loss : 0.168221, loss_ce: 0.186499
iteration 630 : loss : 0.194751, loss_ce: 0.227969
iteration 640 : loss : 0.244145, loss_ce: 0.315067
iteration 650 : loss : 0.193916, loss_ce: 0.224002
iteration 660 : loss : 0.242250, loss_ce: 0.297129
iteration 670 : loss : 0.199195, loss_ce: 0.243315
iteration 680 : loss : 0.205480, loss_ce: 0.244916
iteration 690 : loss : 0.211004, loss_ce: 0.269980
iteration 700 : loss : 0.194724, loss_ce: 0.233068
iteration 710 : loss : 0.190865, loss_ce: 0.210218
iteration 720 : loss : 0.194541, loss_ce: 0.228918
iteration 730 : loss : 0.167526, loss_ce: 0.191256
iteration 740 : loss : 0.218215, loss_ce: 0.254852
iteration 750 : loss : 0.296383, loss_ce: 0.378787
iteration 760 : loss : 0.241421, loss_ce: 0.269111
iteration 770 : loss : 0.182978, loss_ce: 0.211125
iteration 780 : loss : 0.210315, loss_ce: 0.261704
iteration 790 : loss : 0.223838, loss_ce: 0.274760
iteration 800 : loss : 0.235370, loss_ce: 0.293531
iteration 810 : loss : 0.249149, loss_ce: 0.260007
iteration 820 : loss : 0.218314, loss_ce: 0.252479
iteration 830 : loss : 0.254888, loss_ce: 0.308343
iteration 840 : loss : 0.189127, loss_ce: 0.228525
iteration 850 : loss : 0.176453, loss_ce: 0.193933
iteration 860 : loss : 0.206198, loss_ce: 0.243048
iteration 870 : loss : 0.179537, loss_ce: 0.215041
iteration 880 : loss : 0.230320, loss_ce: 0.297305
iteration 890 : loss : 0.198230, loss_ce: 0.224515
iteration 900 : loss : 0.176200, loss_ce: 0.193940
iteration 910 : loss : 0.239386, loss_ce: 0.296351
iteration 920 : loss : 0.187687, loss_ce: 0.200655
iteration 930 : loss : 0.218135, loss_ce: 0.267720
iteration 940 : loss : 0.190475, loss_ce: 0.229263
iteration 950 : loss : 0.185408, loss_ce: 0.231377
iteration 960 : loss : 0.238462, loss_ce: 0.300376
iteration 970 : loss : 0.160891, loss_ce: 0.175062
iteration 980 : loss : 0.178876, loss_ce: 0.200280
iteration 990 : loss : 0.144806, loss_ce: 0.168551
iteration 1000 : loss : 0.242546, loss_ce: 0.315109
iteration 1010 : loss : 0.227598, loss_ce: 0.286672
iteration 1020 : loss : 0.184621, loss_ce: 0.222178
iteration 1030 : loss : 0.156234, loss_ce: 0.177050
iteration 1040 : loss : 0.193741, loss_ce: 0.243864
iteration 1050 : loss : 0.181239, loss_ce: 0.212335
iteration 1060 : loss : 0.173115, loss_ce: 0.213087
iteration 1070 : loss : 0.175209, loss_ce: 0.205562
iteration 1080 : loss : 0.178152, loss_ce: 0.216612
iteration 1090 : loss : 0.202467, loss_ce: 0.212887
iteration 1100 : loss : 0.258125, loss_ce: 0.320028
iteration 1110 : loss : 0.197252, loss_ce: 0.242545
iteration 1120 : loss : 0.200405, loss_ce: 0.253015
iteration 1130 : loss : 0.180233, loss_ce: 0.210245
iteration 1140 : loss : 0.185404, loss_ce: 0.227097
iteration 1150 : loss : 0.179439, loss_ce: 0.216343
iteration 1160 : loss : 0.180377, loss_ce: 0.213853
iteration 1170 : loss : 0.145791, loss_ce: 0.182530
iteration 1180 : loss : 0.195135, loss_ce: 0.247594
iteration 1190 : loss : 0.155346, loss_ce: 0.183014
iteration 1200 : loss : 0.228380, loss_ce: 0.265850
iteration 1210 : loss : 0.159668, loss_ce: 0.186420
iteration 1220 : loss : 0.166974, loss_ce: 0.194476
iteration 1230 : loss : 0.197009, loss_ce: 0.244469
iteration 1240 : loss : 0.160821, loss_ce: 0.192176
iteration 1250 : loss : 0.171389, loss_ce: 0.200223
iteration 1260 : loss : 0.163498, loss_ce: 0.178645
iteration 1270 : loss : 0.171598, loss_ce: 0.217591
iteration 1280 : loss : 0.168810, loss_ce: 0.194214
iteration 1290 : loss : 0.158936, loss_ce: 0.182542
iteration 1300 : loss : 0.152929, loss_ce: 0.179948
iteration 1310 : loss : 0.170747, loss_ce: 0.210832
iteration 1320 : loss : 0.179682, loss_ce: 0.219227
iteration 1330 : loss : 0.202571, loss_ce: 0.250114
iteration 1340 : loss : 0.187286, loss_ce: 0.231699
iteration 1350 : loss : 0.179228, loss_ce: 0.222194
iteration 1360 : loss : 0.169046, loss_ce: 0.207721
iteration 1370 : loss : 0.155983, loss_ce: 0.177197
iteration 1380 : loss : 0.199197, loss_ce: 0.256679
iteration 1390 : loss : 0.179060, loss_ce: 0.212368
iteration 1400 : loss : 0.178831, loss_ce: 0.221810
iteration 1410 : loss : 0.173423, loss_ce: 0.216880
iteration 1420 : loss : 0.183398, loss_ce: 0.220990
iteration 1430 : loss : 0.141734, loss_ce: 0.166175
iteration 1440 : loss : 0.168408, loss_ce: 0.198401
iteration 1450 : loss : 0.179937, loss_ce: 0.220221
iteration 1460 : loss : 0.152269, loss_ce: 0.161798
iteration 1470 : loss : 0.173141, loss_ce: 0.219761
iteration 1480 : loss : 0.153467, loss_ce: 0.177957
iteration 1490 : loss : 0.153130, loss_ce: 0.183448
iteration 1500 : loss : 0.252278, loss_ce: 0.312859
iteration 1510 : loss : 0.187955, loss_ce: 0.214262
iteration 1520 : loss : 0.150611, loss_ce: 0.178317
iteration 1530 : loss : 0.171578, loss_ce: 0.206280
iteration 1540 : loss : 0.184588, loss_ce: 0.222406
iteration 1550 : loss : 0.165336, loss_ce: 0.186370
iteration 1560 : loss : 0.137427, loss_ce: 0.166861
iteration 1570 : loss : 0.151750, loss_ce: 0.185742
iteration 1580 : loss : 0.146867, loss_ce: 0.161108
iteration 1590 : loss : 0.161126, loss_ce: 0.201162
iteration 1600 : loss : 0.170134, loss_ce: 0.197918
iteration 1610 : loss : 0.151966, loss_ce: 0.178201
iteration 1620 : loss : 0.197005, loss_ce: 0.248227
iteration 1630 : loss : 0.186573, loss_ce: 0.224932
iteration 1640 : loss : 0.153377, loss_ce: 0.173929
iteration 1650 : loss : 0.173921, loss_ce: 0.189104
iteration 1660 : loss : 0.146894, loss_ce: 0.172128
iteration 1670 : loss : 0.174645, loss_ce: 0.205740
iteration 1680 : loss : 0.149567, loss_ce: 0.186751
iteration 1690 : loss : 0.162901, loss_ce: 0.194508
iteration 1700 : loss : 0.178603, loss_ce: 0.227133
iteration 1710 : loss : 0.159511, loss_ce: 0.201643
iteration 1720 : loss : 0.149118, loss_ce: 0.176516
iteration 1730 : loss : 0.150821, loss_ce: 0.183148
iteration 1740 : loss : 0.143737, loss_ce: 0.176546
iteration 1750 : loss : 0.163035, loss_ce: 0.205633
iteration 1760 : loss : 0.161091, loss_ce: 0.197122
iteration 1770 : loss : 0.157320, loss_ce: 0.179112
iteration 1780 : loss : 0.149995, loss_ce: 0.180151
iteration 1790 : loss : 0.150067, loss_ce: 0.179141
iteration 1800 : loss : 0.201821, loss_ce: 0.249313
iteration 1810 : loss : 0.156339, loss_ce: 0.188611
iteration 1820 : loss : 0.158280, loss_ce: 0.180493
iteration 1830 : loss : 0.171094, loss_ce: 0.197447
iteration 1840 : loss : 0.153964, loss_ce: 0.187105
iteration 1850 : loss : 0.155911, loss_ce: 0.194407
iteration 1860 : loss : 0.159395, loss_ce: 0.196519
iteration 1870 : loss : 0.172997, loss_ce: 0.218635
iteration 1880 : loss : 0.147701, loss_ce: 0.183114
iteration 1890 : loss : 0.156490, loss_ce: 0.175430
iteration 1900 : loss : 0.157782, loss_ce: 0.191438
iteration 1910 : loss : 0.164478, loss_ce: 0.188315
iteration 1920 : loss : 0.133842, loss_ce: 0.151633
iteration 1930 : loss : 0.155579, loss_ce: 0.192765
iteration 1940 : loss : 0.165751, loss_ce: 0.205501
iteration 1950 : loss : 0.149776, loss_ce: 0.188986
iteration 1960 : loss : 0.152851, loss_ce: 0.179697
iteration 1970 : loss : 0.184177, loss_ce: 0.234849
iteration 1980 : loss : 0.153118, loss_ce: 0.184282
iteration 1990 : loss : 0.157322, loss_ce: 0.200337
iteration 2000 : loss : 0.172451, loss_ce: 0.172074
iteration 2010 : loss : 0.152206, loss_ce: 0.181684
iteration 2020 : loss : 0.151822, loss_ce: 0.190398
iteration 2030 : loss : 0.166832, loss_ce: 0.203242
iteration 2040 : loss : 0.165843, loss_ce: 0.185735
iteration 2050 : loss : 0.143704, loss_ce: 0.174149
iteration 2060 : loss : 0.162966, loss_ce: 0.206919
iteration 2070 : loss : 0.147184, loss_ce: 0.171992
iteration 2080 : loss : 0.153760, loss_ce: 0.185859
iteration 2090 : loss : 0.167172, loss_ce: 0.202048
iteration 2100 : loss : 0.205013, loss_ce: 0.264152
iteration 2110 : loss : 0.144865, loss_ce: 0.186279
iteration 2120 : loss : 0.155832, loss_ce: 0.189809
iteration 2130 : loss : 0.164428, loss_ce: 0.208761
iteration 2140 : loss : 0.153753, loss_ce: 0.182729
iteration 2150 : loss : 0.155416, loss_ce: 0.163270
iteration 2160 : loss : 0.128080, loss_ce: 0.150323
iteration 2170 : loss : 0.131390, loss_ce: 0.152038
iteration 2180 : loss : 0.123677, loss_ce: 0.142371
iteration 2190 : loss : 0.125801, loss_ce: 0.142139
iteration 2200 : loss : 0.168104, loss_ce: 0.212895
iteration 2210 : loss : 0.140444, loss_ce: 0.176591
iteration 2220 : loss : 0.156469, loss_ce: 0.194589
iteration 2230 : loss : 0.161884, loss_ce: 0.199729
iteration 2240 : loss : 0.174184, loss_ce: 0.211115
iteration 2250 : loss : 0.185773, loss_ce: 0.189475
iteration 2260 : loss : 0.150970, loss_ce: 0.181919
iteration 2270 : loss : 0.127634, loss_ce: 0.148461
iteration 2280 : loss : 0.156707, loss_ce: 0.196087
iteration 2290 : loss : 0.149712, loss_ce: 0.195421
iteration 2300 : loss : 0.133588, loss_ce: 0.160612
iteration 2310 : loss : 0.149479, loss_ce: 0.193109
iteration 2320 : loss : 0.147187, loss_ce: 0.167853
iteration 2330 : loss : 0.126025, loss_ce: 0.137756
iteration 2340 : loss : 0.124349, loss_ce: 0.157055
iteration 2350 : loss : 0.179670, loss_ce: 0.220855
iteration 2360 : loss : 0.161586, loss_ce: 0.194376
iteration 2370 : loss : 0.150664, loss_ce: 0.194330
iteration 2380 : loss : 0.118788, loss_ce: 0.134142
iteration 2390 : loss : 0.129645, loss_ce: 0.158928
iteration 2400 : loss : 0.147394, loss_ce: 0.172569
iteration 2410 : loss : 0.149473, loss_ce: 0.185319
iteration 2420 : loss : 0.159138, loss_ce: 0.186505
iteration 2430 : loss : 0.143047, loss_ce: 0.168784
iteration 2440 : loss : 0.147526, loss_ce: 0.184315
iteration 2450 : loss : 0.148257, loss_ce: 0.176039
iteration 2460 : loss : 0.184151, loss_ce: 0.228535
iteration 2470 : loss : 0.130763, loss_ce: 0.155238
iteration 2480 : loss : 0.157645, loss_ce: 0.203065
iteration 2490 : loss : 0.146159, loss_ce: 0.175782
iteration 2500 : loss : 0.132778, loss_ce: 0.163740
iteration 2510 : loss : 0.152308, loss_ce: 0.197459
iteration 2520 : loss : 0.176866, loss_ce: 0.219112
iteration 2530 : loss : 0.137329, loss_ce: 0.166985
iteration 2540 : loss : 0.138768, loss_ce: 0.160166
iteration 2550 : loss : 0.155396, loss_ce: 0.198376
iteration 2560 : loss : 0.138207, loss_ce: 0.165615
iteration 2570 : loss : 0.121258, loss_ce: 0.146939
iteration 2580 : loss : 0.137629, loss_ce: 0.168620
iteration 2590 : loss : 0.146506, loss_ce: 0.184083
iteration 2600 : loss : 0.156465, loss_ce: 0.177616
iteration 2610 : loss : 0.157283, loss_ce: 0.182893
iteration 2620 : loss : 0.140518, loss_ce: 0.169519
iteration 2630 : loss : 0.126353, loss_ce: 0.141385
iteration 2640 : loss : 0.134191, loss_ce: 0.154923
iteration 2650 : loss : 0.121364, loss_ce: 0.145136
iteration 2660 : loss : 0.145097, loss_ce: 0.175193
iteration 2670 : loss : 0.150683, loss_ce: 0.192007
iteration 2680 : loss : 0.168544, loss_ce: 0.208966
iteration 2690 : loss : 0.138882, loss_ce: 0.172999
iteration 2700 : loss : 0.125030, loss_ce: 0.138900
iteration 2710 : loss : 0.143386, loss_ce: 0.183220
iteration 2720 : loss : 0.139169, loss_ce: 0.164439
iteration 2730 : loss : 0.138192, loss_ce: 0.161964
iteration 2740 : loss : 0.150136, loss_ce: 0.189114
iteration 2750 : loss : 0.162040, loss_ce: 0.179460
iteration 2760 : loss : 0.138223, loss_ce: 0.173213
iteration 2770 : loss : 0.120451, loss_ce: 0.137277
iteration 2780 : loss : 0.147341, loss_ce: 0.173673
iteration 2790 : loss : 0.165696, loss_ce: 0.197485
iteration 2800 : loss : 0.147209, loss_ce: 0.177964
iteration 2810 : loss : 0.174339, loss_ce: 0.221234
iteration 2820 : loss : 0.166012, loss_ce: 0.215982
iteration 2830 : loss : 0.154921, loss_ce: 0.186515
iteration 2840 : loss : 0.124249, loss_ce: 0.137165
iteration 2850 : loss : 0.150744, loss_ce: 0.179873
iteration 2860 : loss : 0.155428, loss_ce: 0.197080
iteration 2870 : loss : 0.188417, loss_ce: 0.257254
iteration 2880 : loss : 0.145012, loss_ce: 0.175630
iteration 2890 : loss : 0.156089, loss_ce: 0.195294
iteration 2900 : loss : 0.151110, loss_ce: 0.188505
iteration 2910 : loss : 0.136103, loss_ce: 0.166780
iteration 2920 : loss : 0.132481, loss_ce: 0.152741
iteration 2930 : loss : 0.142139, loss_ce: 0.179499
iteration 2940 : loss : 0.118729, loss_ce: 0.147705
iteration 2950 : loss : 0.116075, loss_ce: 0.144209
iteration 2960 : loss : 0.108595, loss_ce: 0.133837
iteration 2970 : loss : 0.147951, loss_ce: 0.184179
iteration 2980 : loss : 0.157063, loss_ce: 0.176537
iteration 2990 : loss : 0.181870, loss_ce: 0.234823
iteration 3000 : loss : 0.128708, loss_ce: 0.157460
iteration 3010 : loss : 0.136163, loss_ce: 0.164923
iteration 3020 : loss : 0.142297, loss_ce: 0.166557
iteration 3030 : loss : 0.137621, loss_ce: 0.160329
iteration 3040 : loss : 0.118549, loss_ce: 0.141236
iteration 3050 : loss : 0.140575, loss_ce: 0.174677
iteration 3060 : loss : 0.137013, loss_ce: 0.159671
iteration 3070 : loss : 0.119667, loss_ce: 0.152801
iteration 3080 : loss : 0.172731, loss_ce: 0.223497
iteration 3090 : loss : 0.099919, loss_ce: 0.115618
iteration 3100 : loss : 0.135926, loss_ce: 0.166757
iteration 3110 : loss : 0.139225, loss_ce: 0.177851
iteration 3120 : loss : 0.152394, loss_ce: 0.190762
iteration 3130 : loss : 0.143934, loss_ce: 0.180965
iteration 3140 : loss : 0.150466, loss_ce: 0.196860
iteration 3150 : loss : 0.141108, loss_ce: 0.170404
iteration 3160 : loss : 0.140215, loss_ce: 0.177934
iteration 3170 : loss : 0.171456, loss_ce: 0.216242
iteration 3180 : loss : 0.129338, loss_ce: 0.151985
iteration 3190 : loss : 0.134514, loss_ce: 0.158083
iteration 3200 : loss : 0.126987, loss_ce: 0.153771
iteration 3210 : loss : 0.139819, loss_ce: 0.172194
iteration 3220 : loss : 0.121887, loss_ce: 0.151605
iteration 3230 : loss : 0.131339, loss_ce: 0.170204
iteration 3240 : loss : 0.120098, loss_ce: 0.149165
iteration 3250 : loss : 0.173852, loss_ce: 0.214871
iteration 3260 : loss : 0.148439, loss_ce: 0.166423
iteration 3270 : loss : 0.119295, loss_ce: 0.139306
iteration 3280 : loss : 0.115043, loss_ce: 0.148194
iteration 3290 : loss : 0.141673, loss_ce: 0.174449
iteration 3300 : loss : 0.123628, loss_ce: 0.137120
iteration 3310 : loss : 0.109024, loss_ce: 0.134446
iteration 3320 : loss : 0.132862, loss_ce: 0.164133
iteration 3330 : loss : 0.165598, loss_ce: 0.207269
iteration 3340 : loss : 0.152173, loss_ce: 0.185903
iteration 3350 : loss : 0.141158, loss_ce: 0.181285
iteration 3360 : loss : 0.107792, loss_ce: 0.132854
iteration 3370 : loss : 0.109871, loss_ce: 0.133359
iteration 3380 : loss : 0.140910, loss_ce: 0.178085
iteration 3390 : loss : 0.128819, loss_ce: 0.158937
iteration 3400 : loss : 0.143894, loss_ce: 0.181838
iteration 3410 : loss : 0.132635, loss_ce: 0.153296
iteration 3420 : loss : 0.117686, loss_ce: 0.138058
iteration 3430 : loss : 0.152625, loss_ce: 0.191958
iteration 3440 : loss : 0.120543, loss_ce: 0.147168
iteration 3450 : loss : 0.129081, loss_ce: 0.152879
iteration 3460 : loss : 0.141535, loss_ce: 0.156027
iteration 3470 : loss : 0.132588, loss_ce: 0.160829
iteration 3480 : loss : 0.137007, loss_ce: 0.171075
iteration 3490 : loss : 0.151011, loss_ce: 0.191961
iteration 3500 : loss : 0.113651, loss_ce: 0.112321
iteration 3510 : loss : 0.142829, loss_ce: 0.158991
iteration 3520 : loss : 0.156333, loss_ce: 0.201156
iteration 3530 : loss : 0.132488, loss_ce: 0.164568
iteration 3540 : loss : 0.131296, loss_ce: 0.161802
iteration 3550 : loss : 0.148189, loss_ce: 0.190063
iteration 3560 : loss : 0.110935, loss_ce: 0.130662
iteration 3570 : loss : 0.110816, loss_ce: 0.141116
iteration 3580 : loss : 0.122273, loss_ce: 0.158248
iteration 3590 : loss : 0.134085, loss_ce: 0.161306
iteration 3600 : loss : 0.113243, loss_ce: 0.134094
iteration 3610 : loss : 0.122231, loss_ce: 0.138563
iteration 3620 : loss : 0.153460, loss_ce: 0.199976
iteration 3630 : loss : 0.142472, loss_ce: 0.184160
iteration 3640 : loss : 0.135336, loss_ce: 0.150750
iteration 3650 : loss : 0.122537, loss_ce: 0.152237
iteration 3660 : loss : 0.156903, loss_ce: 0.180064
iteration 3670 : loss : 0.135537, loss_ce: 0.160238
iteration 3680 : loss : 0.127540, loss_ce: 0.148726
iteration 3690 : loss : 0.105657, loss_ce: 0.122841
iteration 3700 : loss : 0.151236, loss_ce: 0.194750
iteration 3710 : loss : 0.137511, loss_ce: 0.177146
iteration 3720 : loss : 0.139362, loss_ce: 0.166809
iteration 3730 : loss : 0.115611, loss_ce: 0.141552
iteration 3740 : loss : 0.117165, loss_ce: 0.139908
iteration 3750 : loss : 0.118369, loss_ce: 0.148879
iteration 3760 : loss : 0.127699, loss_ce: 0.161241
iteration 3770 : loss : 0.127217, loss_ce: 0.151117
iteration 3780 : loss : 0.133513, loss_ce: 0.171123
iteration 3790 : loss : 0.118957, loss_ce: 0.137007
iteration 3800 : loss : 0.117534, loss_ce: 0.138586
iteration 3810 : loss : 0.115033, loss_ce: 0.142838
iteration 3820 : loss : 0.140590, loss_ce: 0.169340
iteration 3830 : loss : 0.134565, loss_ce: 0.151488
iteration 3840 : loss : 0.129432, loss_ce: 0.156876
iteration 3850 : loss : 0.126814, loss_ce: 0.152696
iteration 3860 : loss : 0.133126, loss_ce: 0.162847
iteration 3870 : loss : 0.113741, loss_ce: 0.133882
iteration 3880 : loss : 0.134455, loss_ce: 0.141383
iteration 3890 : loss : 0.144357, loss_ce: 0.186364
iteration 3900 : loss : 0.122137, loss_ce: 0.144869
iteration 3910 : loss : 0.167036, loss_ce: 0.210711
iteration 3920 : loss : 0.132062, loss_ce: 0.164024
iteration 3930 : loss : 0.159093, loss_ce: 0.199515
iteration 3940 : loss : 0.121027, loss_ce: 0.134392
iteration 3950 : loss : 0.114049, loss_ce: 0.143787
iteration 3960 : loss : 0.176792, loss_ce: 0.234764
iteration 3970 : loss : 0.120040, loss_ce: 0.136344
iteration 3980 : loss : 0.144031, loss_ce: 0.185956
iteration 3990 : loss : 0.126621, loss_ce: 0.132754
iteration 4000 : loss : 0.143560, loss_ce: 0.195910
iteration 4010 : loss : 0.120002, loss_ce: 0.144525
iteration 4020 : loss : 0.141839, loss_ce: 0.178676
iteration 4030 : loss : 0.155603, loss_ce: 0.207802
iteration 4040 : loss : 0.102861, loss_ce: 0.126258
iteration 4050 : loss : 0.122243, loss_ce: 0.144322
iteration 4060 : loss : 0.091037, loss_ce: 0.103384
iteration 4070 : loss : 0.129153, loss_ce: 0.164820
iteration 4080 : loss : 0.119284, loss_ce: 0.133608
iteration 4090 : loss : 0.129769, loss_ce: 0.140768
iteration 4100 : loss : 0.122566, loss_ce: 0.156385
iteration 4110 : loss : 0.151877, loss_ce: 0.189081
iteration 4120 : loss : 0.117048, loss_ce: 0.142109
iteration 4130 : loss : 0.110472, loss_ce: 0.138764
iteration 4140 : loss : 0.116738, loss_ce: 0.133504
iteration 4150 : loss : 0.145730, loss_ce: 0.183727
iteration 4160 : loss : 0.118281, loss_ce: 0.145384
iteration 4170 : loss : 0.137890, loss_ce: 0.159307
iteration 4180 : loss : 0.123907, loss_ce: 0.153481
iteration 4190 : loss : 0.127260, loss_ce: 0.164565
iteration 4200 : loss : 0.110732, loss_ce: 0.125762
iteration 4210 : loss : 0.143986, loss_ce: 0.179820
iteration 4220 : loss : 0.125120, loss_ce: 0.148233
iteration 4230 : loss : 0.097114, loss_ce: 0.116594
iteration 4240 : loss : 0.113016, loss_ce: 0.130493
iteration 4250 : loss : 0.098822, loss_ce: 0.127056
iteration 4260 : loss : 0.111727, loss_ce: 0.138495
iteration 4270 : loss : 0.114439, loss_ce: 0.131057
iteration 4280 : loss : 0.144935, loss_ce: 0.191598
iteration 4290 : loss : 0.133130, loss_ce: 0.169959
iteration 4300 : loss : 0.128658, loss_ce: 0.163912
iteration 4310 : loss : 0.110300, loss_ce: 0.134108
iteration 4320 : loss : 0.101082, loss_ce: 0.112039
iteration 4330 : loss : 0.091877, loss_ce: 0.108069
iteration 4340 : loss : 0.123878, loss_ce: 0.155889
iteration 4350 : loss : 0.173797, loss_ce: 0.242269
iteration 4360 : loss : 0.115839, loss_ce: 0.142456
iteration 4370 : loss : 0.144541, loss_ce: 0.185028
iteration 4380 : loss : 0.128266, loss_ce: 0.158897
iteration 4390 : loss : 0.121240, loss_ce: 0.151298
iteration 4400 : loss : 0.139106, loss_ce: 0.178968
iteration 4410 : loss : 0.143477, loss_ce: 0.192823
iteration 4420 : loss : 0.129697, loss_ce: 0.165015
iteration 4430 : loss : 0.129299, loss_ce: 0.150114
iteration 4440 : loss : 0.133932, loss_ce: 0.168165
iteration 4450 : loss : 0.150331, loss_ce: 0.179035
iteration 4460 : loss : 0.134502, loss_ce: 0.168363
iteration 4470 : loss : 0.122922, loss_ce: 0.147724
iteration 4480 : loss : 0.127204, loss_ce: 0.164350
iteration 4490 : loss : 0.101349, loss_ce: 0.115462
iteration 4500 : loss : 0.125536, loss_ce: 0.171405
iteration 4510 : loss : 0.142797, loss_ce: 0.186181
iteration 4520 : loss : 0.166582, loss_ce: 0.212427
iteration 4530 : loss : 0.137479, loss_ce: 0.173866
iteration 4540 : loss : 0.129922, loss_ce: 0.156234
iteration 4550 : loss : 0.135904, loss_ce: 0.153074
iteration 4560 : loss : 0.130035, loss_ce: 0.158643
iteration 4570 : loss : 0.128521, loss_ce: 0.160670
iteration 4580 : loss : 0.138316, loss_ce: 0.170779
iteration 4590 : loss : 0.121758, loss_ce: 0.143205
iteration 4600 : loss : 0.124293, loss_ce: 0.147549
iteration 4610 : loss : 0.130381, loss_ce: 0.160796
iteration 4620 : loss : 0.118347, loss_ce: 0.152102
iteration 4630 : loss : 0.143081, loss_ce: 0.191820
iteration 4640 : loss : 0.127386, loss_ce: 0.163459
iteration 4650 : loss : 0.109801, loss_ce: 0.129891
iteration 4660 : loss : 0.120090, loss_ce: 0.155997
iteration 4670 : loss : 0.147363, loss_ce: 0.182566
iteration 4680 : loss : 0.119668, loss_ce: 0.152482
iteration 4690 : loss : 0.102131, loss_ce: 0.124707
iteration 4700 : loss : 0.122549, loss_ce: 0.151529
iteration 4710 : loss : 0.113740, loss_ce: 0.138969
iteration 4720 : loss : 0.131727, loss_ce: 0.160798
iteration 4730 : loss : 0.119113, loss_ce: 0.143263
iteration 4740 : loss : 0.115843, loss_ce: 0.146097
iteration 4750 : loss : 0.184091, loss_ce: 0.203325
iteration 4760 : loss : 0.109944, loss_ce: 0.131668
iteration 4770 : loss : 0.163340, loss_ce: 0.209889
iteration 4780 : loss : 0.126126, loss_ce: 0.147611
iteration 4790 : loss : 0.117046, loss_ce: 0.148818
iteration 4800 : loss : 0.123652, loss_ce: 0.153907
iteration 4810 : loss : 0.123298, loss_ce: 0.146975
iteration 4820 : loss : 0.121187, loss_ce: 0.126883
iteration 4830 : loss : 0.120594, loss_ce: 0.140078
iteration 4840 : loss : 0.125474, loss_ce: 0.159889
iteration 4850 : loss : 0.126095, loss_ce: 0.162708
iteration 4860 : loss : 0.107752, loss_ce: 0.131507
iteration 4870 : loss : 0.117740, loss_ce: 0.146291
iteration 4880 : loss : 0.107939, loss_ce: 0.138984
iteration 4890 : loss : 0.111279, loss_ce: 0.138337
iteration 4900 : loss : 0.105693, loss_ce: 0.124305
iteration 4910 : loss : 0.106075, loss_ce: 0.123986
iteration 4920 : loss : 0.111962, loss_ce: 0.129951
iteration 4930 : loss : 0.106150, loss_ce: 0.133392
iteration 4940 : loss : 0.132852, loss_ce: 0.169257
iteration 4950 : loss : 0.118361, loss_ce: 0.143444
iteration 4960 : loss : 0.110912, loss_ce: 0.128192
iteration 4970 : loss : 0.110691, loss_ce: 0.141839
iteration 4980 : loss : 0.109071, loss_ce: 0.132727
iteration 4990 : loss : 0.117557, loss_ce: 0.143254
iteration 5000 : loss : 0.123542, loss_ce: 0.153021
iteration 5010 : loss : 0.095667, loss_ce: 0.104485
iteration 5020 : loss : 0.120676, loss_ce: 0.139621
iteration 5030 : loss : 0.113831, loss_ce: 0.144049
iteration 5040 : loss : 0.110923, loss_ce: 0.131858
iteration 5050 : loss : 0.121853, loss_ce: 0.156782
iteration 5060 : loss : 0.117998, loss_ce: 0.148413
iteration 5070 : loss : 0.114370, loss_ce: 0.143706
iteration 5080 : loss : 0.123490, loss_ce: 0.143680
iteration 5090 : loss : 0.121321, loss_ce: 0.146236
iteration 5100 : loss : 0.137482, loss_ce: 0.168496
iteration 5110 : loss : 0.114910, loss_ce: 0.143895
iteration 5120 : loss : 0.121721, loss_ce: 0.150157
iteration 5130 : loss : 0.132670, loss_ce: 0.171489
iteration 5140 : loss : 0.105485, loss_ce: 0.124225
iteration 5150 : loss : 0.116671, loss_ce: 0.137609
iteration 5160 : loss : 0.106883, loss_ce: 0.137500
iteration 5170 : loss : 0.107928, loss_ce: 0.128597
iteration 5180 : loss : 0.142686, loss_ce: 0.178644
iteration 5190 : loss : 0.116949, loss_ce: 0.147906
iteration 5200 : loss : 0.118482, loss_ce: 0.152155
iteration 5210 : loss : 0.129186, loss_ce: 0.165796
iteration 5220 : loss : 0.135880, loss_ce: 0.159184
iteration 5230 : loss : 0.120869, loss_ce: 0.150028
iteration 5240 : loss : 0.101240, loss_ce: 0.122978
iteration 5250 : loss : 0.078907, loss_ce: 0.080493
iteration 5260 : loss : 0.123135, loss_ce: 0.151494
iteration 5270 : loss : 0.125019, loss_ce: 0.158179
iteration 5280 : loss : 0.103107, loss_ce: 0.119954
iteration 5290 : loss : 0.117743, loss_ce: 0.148700
iteration 5300 : loss : 0.155124, loss_ce: 0.201566
iteration 5310 : loss : 0.116892, loss_ce: 0.142722
iteration 5320 : loss : 0.120828, loss_ce: 0.138780
iteration 5330 : loss : 0.099739, loss_ce: 0.122534
iteration 5340 : loss : 0.117081, loss_ce: 0.142902
iteration 5350 : loss : 0.107851, loss_ce: 0.133204
iteration 5360 : loss : 0.122144, loss_ce: 0.147259
iteration 5370 : loss : 0.101729, loss_ce: 0.124264
iteration 5380 : loss : 0.108096, loss_ce: 0.126647
iteration 5390 : loss : 0.125839, loss_ce: 0.165462
iteration 5400 : loss : 0.101617, loss_ce: 0.120327
iteration 5410 : loss : 0.114923, loss_ce: 0.139907
iteration 5420 : loss : 0.093157, loss_ce: 0.094876
iteration 5430 : loss : 0.128488, loss_ce: 0.172775
iteration 5440 : loss : 0.073646, loss_ce: 0.085357
iteration 5450 : loss : 0.115801, loss_ce: 0.150583
iteration 5460 : loss : 0.126177, loss_ce: 0.162867
iteration 5470 : loss : 0.156598, loss_ce: 0.204407
iteration 5480 : loss : 0.101732, loss_ce: 0.121851
iteration 5490 : loss : 0.102098, loss_ce: 0.124483
iteration 5500 : loss : 0.132767, loss_ce: 0.107060
iteration 5510 : loss : 0.098253, loss_ce: 0.114319
iteration 5520 : loss : 0.109183, loss_ce: 0.134961
iteration 5530 : loss : 0.126892, loss_ce: 0.152996
iteration 5540 : loss : 0.098990, loss_ce: 0.115150
iteration 5550 : loss : 0.101534, loss_ce: 0.113431
iteration 5560 : loss : 0.114644, loss_ce: 0.142925
iteration 5570 : loss : 0.114383, loss_ce: 0.138787
iteration 5580 : loss : 0.104246, loss_ce: 0.126862
iteration 5590 : loss : 0.118061, loss_ce: 0.155779
iteration 5600 : loss : 0.123311, loss_ce: 0.163650
iteration 5610 : loss : 0.102641, loss_ce: 0.131498
iteration 5620 : loss : 0.119503, loss_ce: 0.148475
iteration 5630 : loss : 0.117346, loss_ce: 0.132718
iteration 5640 : loss : 0.115054, loss_ce: 0.145918
iteration 5650 : loss : 0.101578, loss_ce: 0.119622
iteration 5660 : loss : 0.100288, loss_ce: 0.128063
iteration 5670 : loss : 0.117952, loss_ce: 0.144642
iteration 5680 : loss : 0.114499, loss_ce: 0.145402
iteration 5690 : loss : 0.133461, loss_ce: 0.179263
iteration 5700 : loss : 0.121755, loss_ce: 0.141186
iteration 5710 : loss : 0.116914, loss_ce: 0.143054
iteration 5720 : loss : 0.090847, loss_ce: 0.114031
iteration 5730 : loss : 0.125316, loss_ce: 0.158227
iteration 5740 : loss : 0.092816, loss_ce: 0.116250
iteration 5750 : loss : 0.114226, loss_ce: 0.134633
iteration 5760 : loss : 0.090892, loss_ce: 0.107630
iteration 5770 : loss : 0.106424, loss_ce: 0.126621
iteration 5780 : loss : 0.096488, loss_ce: 0.118481
iteration 5790 : loss : 0.092673, loss_ce: 0.102341
iteration 5800 : loss : 0.107170, loss_ce: 0.129360
iteration 5810 : loss : 0.113334, loss_ce: 0.138858
iteration 5820 : loss : 0.119754, loss_ce: 0.142068
iteration 5830 : loss : 0.107074, loss_ce: 0.125927
iteration 5840 : loss : 0.137078, loss_ce: 0.171291
iteration 5850 : loss : 0.114197, loss_ce: 0.153946
iteration 5860 : loss : 0.111141, loss_ce: 0.142323
iteration 5870 : loss : 0.105295, loss_ce: 0.143670
iteration 5880 : loss : 0.100658, loss_ce: 0.122268
iteration 5890 : loss : 0.109694, loss_ce: 0.143002
iteration 5900 : loss : 0.103372, loss_ce: 0.129898
iteration 5910 : loss : 0.158243, loss_ce: 0.208524
iteration 5920 : loss : 0.111946, loss_ce: 0.133743
iteration 5930 : loss : 0.096535, loss_ce: 0.111055
iteration 5940 : loss : 0.103193, loss_ce: 0.123513
iteration 5950 : loss : 0.111230, loss_ce: 0.136320
iteration 5960 : loss : 0.117968, loss_ce: 0.146263
iteration 5970 : loss : 0.114216, loss_ce: 0.140472
iteration 5980 : loss : 0.100333, loss_ce: 0.121627
iteration 5990 : loss : 0.122838, loss_ce: 0.153538
iteration 6000 : loss : 0.114623, loss_ce: 0.155179
iteration 6010 : loss : 0.113457, loss_ce: 0.136613
iteration 6020 : loss : 0.115566, loss_ce: 0.128887
iteration 6030 : loss : 0.115324, loss_ce: 0.137830
iteration 6040 : loss : 0.115866, loss_ce: 0.141779
iteration 6050 : loss : 0.108289, loss_ce: 0.125446
iteration 6060 : loss : 0.124229, loss_ce: 0.157348
iteration 6070 : loss : 0.097802, loss_ce: 0.105824
iteration 6080 : loss : 0.094934, loss_ce: 0.117631
iteration 6090 : loss : 0.098102, loss_ce: 0.125569
iteration 6100 : loss : 0.096059, loss_ce: 0.110896
iteration 6110 : loss : 0.117914, loss_ce: 0.149168
iteration 6120 : loss : 0.103302, loss_ce: 0.128757
iteration 6130 : loss : 0.103017, loss_ce: 0.128961
iteration 6140 : loss : 0.097018, loss_ce: 0.121900
iteration 6150 : loss : 0.116932, loss_ce: 0.147791
iteration 6160 : loss : 0.120819, loss_ce: 0.153340
iteration 6170 : loss : 0.132531, loss_ce: 0.176021
iteration 6180 : loss : 0.103005, loss_ce: 0.119543
iteration 6190 : loss : 0.100190, loss_ce: 0.122121
iteration 6200 : loss : 0.109029, loss_ce: 0.122575
iteration 6210 : loss : 0.123644, loss_ce: 0.150564
iteration 6220 : loss : 0.102574, loss_ce: 0.124006
iteration 6230 : loss : 0.108485, loss_ce: 0.136350
iteration 6240 : loss : 0.104229, loss_ce: 0.127477
iteration 6250 : loss : 0.181402, loss_ce: 0.243761
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_49_iter_6250.pth
iteration 6260 : loss : 0.121899, loss_ce: 0.141202
iteration 6270 : loss : 0.109777, loss_ce: 0.129112
iteration 6280 : loss : 0.105269, loss_ce: 0.129465
iteration 6290 : loss : 0.096844, loss_ce: 0.127559
iteration 6300 : loss : 0.091233, loss_ce: 0.105475
iteration 6310 : loss : 0.098267, loss_ce: 0.117613
iteration 6320 : loss : 0.112006, loss_ce: 0.151657
iteration 6330 : loss : 0.089082, loss_ce: 0.095855
iteration 6340 : loss : 0.122712, loss_ce: 0.154926
iteration 6350 : loss : 0.102726, loss_ce: 0.126547
iteration 6360 : loss : 0.102276, loss_ce: 0.115226
iteration 6370 : loss : 0.098967, loss_ce: 0.122772
iteration 6380 : loss : 0.113151, loss_ce: 0.143865
iteration 6390 : loss : 0.107035, loss_ce: 0.142234
iteration 6400 : loss : 0.121536, loss_ce: 0.156378
iteration 6410 : loss : 0.091311, loss_ce: 0.104370
iteration 6420 : loss : 0.112488, loss_ce: 0.154054
iteration 6430 : loss : 0.100557, loss_ce: 0.118879
iteration 6440 : loss : 0.126136, loss_ce: 0.147122
iteration 6450 : loss : 0.112652, loss_ce: 0.140277
iteration 6460 : loss : 0.098465, loss_ce: 0.111840
iteration 6470 : loss : 0.104193, loss_ce: 0.127842
iteration 6480 : loss : 0.095857, loss_ce: 0.110402
iteration 6490 : loss : 0.111277, loss_ce: 0.138832
iteration 6500 : loss : 0.131065, loss_ce: 0.167776
iteration 6510 : loss : 0.133994, loss_ce: 0.185136
iteration 6520 : loss : 0.092651, loss_ce: 0.103962
iteration 6530 : loss : 0.100070, loss_ce: 0.127644
iteration 6540 : loss : 0.111374, loss_ce: 0.140859
iteration 6550 : loss : 0.128961, loss_ce: 0.167389
iteration 6560 : loss : 0.111599, loss_ce: 0.133198
iteration 6570 : loss : 0.101989, loss_ce: 0.125766
iteration 6580 : loss : 0.116640, loss_ce: 0.143067
iteration 6590 : loss : 0.096685, loss_ce: 0.111073
iteration 6600 : loss : 0.106663, loss_ce: 0.129812
iteration 6610 : loss : 0.118632, loss_ce: 0.149770
iteration 6620 : loss : 0.103968, loss_ce: 0.125205
iteration 6630 : loss : 0.111217, loss_ce: 0.136429
iteration 6640 : loss : 0.096244, loss_ce: 0.117870
iteration 6650 : loss : 0.104962, loss_ce: 0.126086
iteration 6660 : loss : 0.094168, loss_ce: 0.106830
iteration 6670 : loss : 0.101028, loss_ce: 0.113221
iteration 6680 : loss : 0.091290, loss_ce: 0.104332
iteration 6690 : loss : 0.089394, loss_ce: 0.098761
iteration 6700 : loss : 0.097017, loss_ce: 0.113861
iteration 6710 : loss : 0.132045, loss_ce: 0.162465
iteration 6720 : loss : 0.103505, loss_ce: 0.124583
iteration 6730 : loss : 0.130346, loss_ce: 0.166512
iteration 6740 : loss : 0.108160, loss_ce: 0.126331
iteration 6750 : loss : 0.141116, loss_ce: 0.171852
iteration 6760 : loss : 0.101797, loss_ce: 0.122529
iteration 6770 : loss : 0.103538, loss_ce: 0.124701
iteration 6780 : loss : 0.169018, loss_ce: 0.223212
iteration 6790 : loss : 0.168493, loss_ce: 0.221451
iteration 6800 : loss : 0.118151, loss_ce: 0.140979
iteration 6810 : loss : 0.138472, loss_ce: 0.175670
iteration 6820 : loss : 0.120171, loss_ce: 0.156341
iteration 6830 : loss : 0.105443, loss_ce: 0.123632
iteration 6840 : loss : 0.111671, loss_ce: 0.142504
iteration 6850 : loss : 0.122920, loss_ce: 0.149699
iteration 6860 : loss : 0.097091, loss_ce: 0.118146
iteration 6870 : loss : 0.096673, loss_ce: 0.112888
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_54_iter_6875.pth
iteration 6880 : loss : 0.113384, loss_ce: 0.144677
iteration 6890 : loss : 0.116062, loss_ce: 0.151867
iteration 6900 : loss : 0.080560, loss_ce: 0.098387
iteration 6910 : loss : 0.133665, loss_ce: 0.170193
iteration 6920 : loss : 0.107589, loss_ce: 0.139691
iteration 6930 : loss : 0.125074, loss_ce: 0.151551
iteration 6940 : loss : 0.098827, loss_ce: 0.119561
iteration 6950 : loss : 0.079655, loss_ce: 0.092727
iteration 6960 : loss : 0.123114, loss_ce: 0.151649
iteration 6970 : loss : 0.106870, loss_ce: 0.125233
iteration 6980 : loss : 0.114672, loss_ce: 0.135970
iteration 6990 : loss : 0.119660, loss_ce: 0.148649
iteration 7000 : loss : 0.148233, loss_ce: 0.213257
iteration 7010 : loss : 0.149570, loss_ce: 0.207463
iteration 7020 : loss : 0.107711, loss_ce: 0.133094
iteration 7030 : loss : 0.095389, loss_ce: 0.119515
iteration 7040 : loss : 0.110773, loss_ce: 0.136251
iteration 7050 : loss : 0.087094, loss_ce: 0.093609
iteration 7060 : loss : 0.115236, loss_ce: 0.146441
iteration 7070 : loss : 0.112309, loss_ce: 0.142184
iteration 7080 : loss : 0.092524, loss_ce: 0.108701
iteration 7090 : loss : 0.099755, loss_ce: 0.125760
iteration 7100 : loss : 0.104730, loss_ce: 0.135007
iteration 7110 : loss : 0.121342, loss_ce: 0.154488
iteration 7120 : loss : 0.119527, loss_ce: 0.155535
iteration 7130 : loss : 0.106550, loss_ce: 0.142982
iteration 7140 : loss : 0.124931, loss_ce: 0.155434
iteration 7150 : loss : 0.079953, loss_ce: 0.097908
iteration 7160 : loss : 0.090501, loss_ce: 0.101244
iteration 7170 : loss : 0.123605, loss_ce: 0.150957
iteration 7180 : loss : 0.107960, loss_ce: 0.139749
iteration 7190 : loss : 0.104904, loss_ce: 0.134307
iteration 7200 : loss : 0.094439, loss_ce: 0.115771
iteration 7210 : loss : 0.125113, loss_ce: 0.155689
iteration 7220 : loss : 0.101494, loss_ce: 0.115538
iteration 7230 : loss : 0.100545, loss_ce: 0.120914
iteration 7240 : loss : 0.111357, loss_ce: 0.144118
iteration 7250 : loss : 0.101963, loss_ce: 0.126287
iteration 7260 : loss : 0.108586, loss_ce: 0.132402
iteration 7270 : loss : 0.094398, loss_ce: 0.115266
iteration 7280 : loss : 0.088693, loss_ce: 0.115847
iteration 7290 : loss : 0.112070, loss_ce: 0.147122
iteration 7300 : loss : 0.115278, loss_ce: 0.139423
iteration 7310 : loss : 0.075520, loss_ce: 0.088969
iteration 7320 : loss : 0.102663, loss_ce: 0.123870
iteration 7330 : loss : 0.126493, loss_ce: 0.157384
iteration 7340 : loss : 0.116765, loss_ce: 0.161091
iteration 7350 : loss : 0.095397, loss_ce: 0.119536
iteration 7360 : loss : 0.098260, loss_ce: 0.097171
iteration 7370 : loss : 0.122811, loss_ce: 0.157212
iteration 7380 : loss : 0.135894, loss_ce: 0.178298
iteration 7390 : loss : 0.124492, loss_ce: 0.157610
iteration 7400 : loss : 0.104384, loss_ce: 0.125637
iteration 7410 : loss : 0.107911, loss_ce: 0.137419
iteration 7420 : loss : 0.091649, loss_ce: 0.113773
iteration 7430 : loss : 0.092718, loss_ce: 0.116996
iteration 7440 : loss : 0.103905, loss_ce: 0.123902
iteration 7450 : loss : 0.102279, loss_ce: 0.125494
iteration 7460 : loss : 0.087753, loss_ce: 0.109192
iteration 7470 : loss : 0.084189, loss_ce: 0.095699
iteration 7480 : loss : 0.112605, loss_ce: 0.145005
iteration 7490 : loss : 0.111185, loss_ce: 0.141826
iteration 7500 : loss : 0.109236, loss_ce: 0.132971
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_59_iter_7500.pth
iteration 7510 : loss : 0.094588, loss_ce: 0.116947
iteration 7520 : loss : 0.109587, loss_ce: 0.136489
iteration 7530 : loss : 0.092494, loss_ce: 0.115313
iteration 7540 : loss : 0.110538, loss_ce: 0.146011
iteration 7550 : loss : 0.087469, loss_ce: 0.102880
iteration 7560 : loss : 0.094792, loss_ce: 0.123531
iteration 7570 : loss : 0.090438, loss_ce: 0.108974
iteration 7580 : loss : 0.094824, loss_ce: 0.110255
iteration 7590 : loss : 0.099546, loss_ce: 0.128375
iteration 7600 : loss : 0.099030, loss_ce: 0.122072
iteration 7610 : loss : 0.083518, loss_ce: 0.104195
iteration 7620 : loss : 0.106669, loss_ce: 0.131127
iteration 7630 : loss : 0.109540, loss_ce: 0.129692
iteration 7640 : loss : 0.091337, loss_ce: 0.110181
iteration 7650 : loss : 0.108766, loss_ce: 0.138289
iteration 7660 : loss : 0.123230, loss_ce: 0.158019
iteration 7670 : loss : 0.079997, loss_ce: 0.096661
iteration 7680 : loss : 0.105704, loss_ce: 0.132714
iteration 7690 : loss : 0.083178, loss_ce: 0.103218
iteration 7700 : loss : 0.099347, loss_ce: 0.121944
iteration 7710 : loss : 0.105077, loss_ce: 0.132197
iteration 7720 : loss : 0.096665, loss_ce: 0.096754
iteration 7730 : loss : 0.085659, loss_ce: 0.103650
iteration 7740 : loss : 0.116015, loss_ce: 0.142657
iteration 7750 : loss : 0.172828, loss_ce: 0.225995
iteration 7760 : loss : 0.139269, loss_ce: 0.188169
iteration 7770 : loss : 0.131355, loss_ce: 0.164982
iteration 7780 : loss : 0.094839, loss_ce: 0.117302
iteration 7790 : loss : 0.103572, loss_ce: 0.133076
iteration 7800 : loss : 0.105767, loss_ce: 0.133080
iteration 7810 : loss : 0.086941, loss_ce: 0.112707
iteration 7820 : loss : 0.122630, loss_ce: 0.159203
iteration 7830 : loss : 0.107727, loss_ce: 0.137915
iteration 7840 : loss : 0.116719, loss_ce: 0.150599
iteration 7850 : loss : 0.073005, loss_ce: 0.090394
iteration 7860 : loss : 0.112744, loss_ce: 0.144757
iteration 7870 : loss : 0.094258, loss_ce: 0.104475
iteration 7880 : loss : 0.103541, loss_ce: 0.128465
iteration 7890 : loss : 0.090771, loss_ce: 0.108725
iteration 7900 : loss : 0.119772, loss_ce: 0.154634
iteration 7910 : loss : 0.089007, loss_ce: 0.112470
iteration 7920 : loss : 0.088165, loss_ce: 0.110012
iteration 7930 : loss : 0.139958, loss_ce: 0.185741
iteration 7940 : loss : 0.089105, loss_ce: 0.109800
iteration 7950 : loss : 0.087655, loss_ce: 0.113937
iteration 7960 : loss : 0.087730, loss_ce: 0.110961
iteration 7970 : loss : 0.100344, loss_ce: 0.126301
iteration 7980 : loss : 0.122184, loss_ce: 0.162444
iteration 7990 : loss : 0.100666, loss_ce: 0.123825
iteration 8000 : loss : 0.109982, loss_ce: 0.119499
iteration 8010 : loss : 0.083189, loss_ce: 0.102415
iteration 8020 : loss : 0.114288, loss_ce: 0.141078
iteration 8030 : loss : 0.104659, loss_ce: 0.126169
iteration 8040 : loss : 0.079669, loss_ce: 0.094697
iteration 8050 : loss : 0.092772, loss_ce: 0.119777
iteration 8060 : loss : 0.101634, loss_ce: 0.131218
iteration 8070 : loss : 0.103328, loss_ce: 0.109847
iteration 8080 : loss : 0.092310, loss_ce: 0.115856
iteration 8090 : loss : 0.115102, loss_ce: 0.148404
iteration 8100 : loss : 0.090969, loss_ce: 0.099115
iteration 8110 : loss : 0.103911, loss_ce: 0.134902
iteration 8120 : loss : 0.117180, loss_ce: 0.148632
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_64_iter_8125.pth
iteration 8130 : loss : 0.080964, loss_ce: 0.088248
iteration 8140 : loss : 0.112762, loss_ce: 0.149018
iteration 8150 : loss : 0.120082, loss_ce: 0.157644
iteration 8160 : loss : 0.105806, loss_ce: 0.134045
iteration 8170 : loss : 0.093422, loss_ce: 0.105539
iteration 8180 : loss : 0.116185, loss_ce: 0.150314
iteration 8190 : loss : 0.095286, loss_ce: 0.118673
iteration 8200 : loss : 0.115211, loss_ce: 0.142501
iteration 8210 : loss : 0.090720, loss_ce: 0.109714
iteration 8220 : loss : 0.112641, loss_ce: 0.144559
iteration 8230 : loss : 0.090179, loss_ce: 0.116658
iteration 8240 : loss : 0.080772, loss_ce: 0.105777
iteration 8250 : loss : 0.119051, loss_ce: 0.152960
iteration 8260 : loss : 0.091240, loss_ce: 0.111370
iteration 8270 : loss : 0.096706, loss_ce: 0.121861
iteration 8280 : loss : 0.081842, loss_ce: 0.099861
iteration 8290 : loss : 0.099625, loss_ce: 0.131209
iteration 8300 : loss : 0.121039, loss_ce: 0.152191
iteration 8310 : loss : 0.106822, loss_ce: 0.129062
iteration 8320 : loss : 0.102092, loss_ce: 0.132168
iteration 8330 : loss : 0.098322, loss_ce: 0.117199
iteration 8340 : loss : 0.093866, loss_ce: 0.114846
iteration 8350 : loss : 0.109146, loss_ce: 0.142240
iteration 8360 : loss : 0.100511, loss_ce: 0.136222
iteration 8370 : loss : 0.096025, loss_ce: 0.124713
iteration 8380 : loss : 0.100826, loss_ce: 0.127972
iteration 8390 : loss : 0.092026, loss_ce: 0.115987
iteration 8400 : loss : 0.087411, loss_ce: 0.102647
iteration 8410 : loss : 0.107849, loss_ce: 0.137716
iteration 8420 : loss : 0.121310, loss_ce: 0.151644
iteration 8430 : loss : 0.105557, loss_ce: 0.137057
iteration 8440 : loss : 0.085206, loss_ce: 0.099626
iteration 8450 : loss : 0.087438, loss_ce: 0.098594
iteration 8460 : loss : 0.093184, loss_ce: 0.116207
iteration 8470 : loss : 0.095280, loss_ce: 0.125887
iteration 8480 : loss : 0.088152, loss_ce: 0.110099
iteration 8490 : loss : 0.083479, loss_ce: 0.105941
iteration 8500 : loss : 0.097054, loss_ce: 0.116925
iteration 8510 : loss : 0.091394, loss_ce: 0.109526
iteration 8520 : loss : 0.075413, loss_ce: 0.088920
iteration 8530 : loss : 0.090703, loss_ce: 0.108343
iteration 8540 : loss : 0.091654, loss_ce: 0.106796
iteration 8550 : loss : 0.117359, loss_ce: 0.144991
iteration 8560 : loss : 0.101421, loss_ce: 0.127814
iteration 8570 : loss : 0.100370, loss_ce: 0.119994
iteration 8580 : loss : 0.117244, loss_ce: 0.153303
iteration 8590 : loss : 0.092978, loss_ce: 0.104221
iteration 8600 : loss : 0.106631, loss_ce: 0.130475
iteration 8610 : loss : 0.082912, loss_ce: 0.107617
iteration 8620 : loss : 0.103673, loss_ce: 0.127131
iteration 8630 : loss : 0.102962, loss_ce: 0.126368
iteration 8640 : loss : 0.100129, loss_ce: 0.121748
iteration 8650 : loss : 0.102273, loss_ce: 0.124021
iteration 8660 : loss : 0.096249, loss_ce: 0.119971
iteration 8670 : loss : 0.119485, loss_ce: 0.163572
iteration 8680 : loss : 0.089718, loss_ce: 0.107478
iteration 8690 : loss : 0.089358, loss_ce: 0.116660
iteration 8700 : loss : 0.100914, loss_ce: 0.122667
iteration 8710 : loss : 0.108735, loss_ce: 0.136429
iteration 8720 : loss : 0.093353, loss_ce: 0.118121
iteration 8730 : loss : 0.087952, loss_ce: 0.099117
iteration 8740 : loss : 0.083498, loss_ce: 0.094831
iteration 8750 : loss : 0.115596, loss_ce: 0.160274
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_69_iter_8750.pth
iteration 8760 : loss : 0.113493, loss_ce: 0.136662
iteration 8770 : loss : 0.107495, loss_ce: 0.142274
iteration 8780 : loss : 0.103945, loss_ce: 0.120607
iteration 8790 : loss : 0.096824, loss_ce: 0.115769
iteration 8800 : loss : 0.089597, loss_ce: 0.111387
iteration 8810 : loss : 0.093859, loss_ce: 0.116652
iteration 8820 : loss : 0.084963, loss_ce: 0.110187
iteration 8830 : loss : 0.100172, loss_ce: 0.130233
iteration 8840 : loss : 0.124687, loss_ce: 0.147957
iteration 8850 : loss : 0.117337, loss_ce: 0.151328
iteration 8860 : loss : 0.081932, loss_ce: 0.096297
iteration 8870 : loss : 0.099927, loss_ce: 0.133484
iteration 8880 : loss : 0.066651, loss_ce: 0.071170
iteration 8890 : loss : 0.099646, loss_ce: 0.122098
iteration 8900 : loss : 0.084914, loss_ce: 0.099420
iteration 8910 : loss : 0.095161, loss_ce: 0.112983
iteration 8920 : loss : 0.112253, loss_ce: 0.147327
iteration 8930 : loss : 0.115041, loss_ce: 0.149789
iteration 8940 : loss : 0.094751, loss_ce: 0.118447
iteration 8950 : loss : 0.092215, loss_ce: 0.111336
iteration 8960 : loss : 0.097922, loss_ce: 0.124098
iteration 8970 : loss : 0.096964, loss_ce: 0.108200
iteration 8980 : loss : 0.092922, loss_ce: 0.115030
iteration 8990 : loss : 0.096163, loss_ce: 0.130193
iteration 9000 : loss : 0.106964, loss_ce: 0.122946
iteration 9010 : loss : 0.091857, loss_ce: 0.109767
iteration 9020 : loss : 0.097149, loss_ce: 0.111609
iteration 9030 : loss : 0.088262, loss_ce: 0.113882
iteration 9040 : loss : 0.091508, loss_ce: 0.116083
iteration 9050 : loss : 0.083786, loss_ce: 0.110882
iteration 9060 : loss : 0.098337, loss_ce: 0.122626
iteration 9070 : loss : 0.081403, loss_ce: 0.096467
iteration 9080 : loss : 0.096616, loss_ce: 0.115759
iteration 9090 : loss : 0.122175, loss_ce: 0.150667
iteration 9100 : loss : 0.122580, loss_ce: 0.157241
iteration 9110 : loss : 0.128639, loss_ce: 0.165801
iteration 9120 : loss : 0.090101, loss_ce: 0.114953
iteration 9130 : loss : 0.096625, loss_ce: 0.122777
iteration 9140 : loss : 0.095314, loss_ce: 0.108226
iteration 9150 : loss : 0.099951, loss_ce: 0.118075
iteration 9160 : loss : 0.095872, loss_ce: 0.123155
iteration 9170 : loss : 0.111912, loss_ce: 0.146943
iteration 9180 : loss : 0.095799, loss_ce: 0.119128
iteration 9190 : loss : 0.099154, loss_ce: 0.118064
iteration 9200 : loss : 0.107732, loss_ce: 0.136699
iteration 9210 : loss : 0.089145, loss_ce: 0.102271
iteration 9220 : loss : 0.107162, loss_ce: 0.144934
iteration 9230 : loss : 0.109776, loss_ce: 0.148663
iteration 9240 : loss : 0.078719, loss_ce: 0.096309
iteration 9250 : loss : 0.113618, loss_ce: 0.118493
iteration 9260 : loss : 0.078539, loss_ce: 0.098756
iteration 9270 : loss : 0.087668, loss_ce: 0.111385
iteration 9280 : loss : 0.092351, loss_ce: 0.112800
iteration 9290 : loss : 0.100473, loss_ce: 0.127318
iteration 9300 : loss : 0.076918, loss_ce: 0.094476
iteration 9310 : loss : 0.103787, loss_ce: 0.126994
iteration 9320 : loss : 0.097488, loss_ce: 0.123969
iteration 9330 : loss : 0.073343, loss_ce: 0.084120
iteration 9340 : loss : 0.098149, loss_ce: 0.130426
iteration 9350 : loss : 0.097694, loss_ce: 0.127120
iteration 9360 : loss : 0.096809, loss_ce: 0.124555
iteration 9370 : loss : 0.090340, loss_ce: 0.114724
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_74_iter_9375.pth
iteration 9380 : loss : 0.084535, loss_ce: 0.090348
iteration 9390 : loss : 0.114783, loss_ce: 0.158142
iteration 9400 : loss : 0.078844, loss_ce: 0.095697
iteration 9410 : loss : 0.086576, loss_ce: 0.105024
iteration 9420 : loss : 0.091167, loss_ce: 0.110479
iteration 9430 : loss : 0.096514, loss_ce: 0.114534
iteration 9440 : loss : 0.091468, loss_ce: 0.115046
iteration 9450 : loss : 0.094036, loss_ce: 0.115786
iteration 9460 : loss : 0.079054, loss_ce: 0.085809
iteration 9470 : loss : 0.092858, loss_ce: 0.120227
iteration 9480 : loss : 0.091275, loss_ce: 0.113021
iteration 9490 : loss : 0.064572, loss_ce: 0.070435
iteration 9500 : loss : 0.126525, loss_ce: 0.164170
iteration 9510 : loss : 0.096069, loss_ce: 0.119511
iteration 9520 : loss : 0.104209, loss_ce: 0.135216
iteration 9530 : loss : 0.079202, loss_ce: 0.104681
iteration 9540 : loss : 0.094574, loss_ce: 0.124834
iteration 9550 : loss : 0.085874, loss_ce: 0.090396
iteration 9560 : loss : 0.101668, loss_ce: 0.123416
iteration 9570 : loss : 0.108368, loss_ce: 0.146209
iteration 9580 : loss : 0.098523, loss_ce: 0.118973
iteration 9590 : loss : 0.072356, loss_ce: 0.080220
iteration 9600 : loss : 0.099297, loss_ce: 0.120791
iteration 9610 : loss : 0.075802, loss_ce: 0.094382
iteration 9620 : loss : 0.092353, loss_ce: 0.117794
iteration 9630 : loss : 0.106768, loss_ce: 0.132818
iteration 9640 : loss : 0.086995, loss_ce: 0.108681
iteration 9650 : loss : 0.085090, loss_ce: 0.112582
iteration 9660 : loss : 0.096684, loss_ce: 0.117912
iteration 9670 : loss : 0.113080, loss_ce: 0.148509
iteration 9680 : loss : 0.078385, loss_ce: 0.091127
iteration 9690 : loss : 0.085083, loss_ce: 0.103357
iteration 9700 : loss : 0.095384, loss_ce: 0.117185
iteration 9710 : loss : 0.092297, loss_ce: 0.115023
iteration 9720 : loss : 0.079504, loss_ce: 0.098697
iteration 9730 : loss : 0.112331, loss_ce: 0.141908
iteration 9740 : loss : 0.070392, loss_ce: 0.083258
iteration 9750 : loss : 0.090165, loss_ce: 0.123438
iteration 9760 : loss : 0.088817, loss_ce: 0.115106
iteration 9770 : loss : 0.100869, loss_ce: 0.120560
iteration 9780 : loss : 0.101059, loss_ce: 0.122750
iteration 9790 : loss : 0.078741, loss_ce: 0.094333
iteration 9800 : loss : 0.094628, loss_ce: 0.121713
iteration 9810 : loss : 0.091094, loss_ce: 0.117303
iteration 9820 : loss : 0.078178, loss_ce: 0.076456
iteration 9830 : loss : 0.080980, loss_ce: 0.096718
iteration 9840 : loss : 0.095602, loss_ce: 0.126483
iteration 9850 : loss : 0.078666, loss_ce: 0.099363
iteration 9860 : loss : 0.078782, loss_ce: 0.087971
iteration 9870 : loss : 0.102608, loss_ce: 0.135944
iteration 9880 : loss : 0.076092, loss_ce: 0.091654
iteration 9890 : loss : 0.105541, loss_ce: 0.139269
iteration 9900 : loss : 0.093921, loss_ce: 0.117578
iteration 9910 : loss : 0.088829, loss_ce: 0.099221
iteration 9920 : loss : 0.115608, loss_ce: 0.147254
iteration 9930 : loss : 0.077152, loss_ce: 0.087644
iteration 9940 : loss : 0.081661, loss_ce: 0.096972
iteration 9950 : loss : 0.086811, loss_ce: 0.106862
iteration 9960 : loss : 0.120973, loss_ce: 0.163637
iteration 9970 : loss : 0.082027, loss_ce: 0.092487
iteration 9980 : loss : 0.078730, loss_ce: 0.103294
iteration 9990 : loss : 0.074094, loss_ce: 0.084221
iteration 10000 : loss : 0.062876, loss_ce: 0.083464
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_79_iter_10000.pth
iteration 10010 : loss : 0.097630, loss_ce: 0.120639
iteration 10020 : loss : 0.098091, loss_ce: 0.124338
iteration 10030 : loss : 0.075574, loss_ce: 0.090510
iteration 10040 : loss : 0.087639, loss_ce: 0.105862
iteration 10050 : loss : 0.083454, loss_ce: 0.100974
iteration 10060 : loss : 0.080815, loss_ce: 0.096471
iteration 10070 : loss : 0.084715, loss_ce: 0.103475
iteration 10080 : loss : 0.078279, loss_ce: 0.079840
iteration 10090 : loss : 0.082460, loss_ce: 0.095212
iteration 10100 : loss : 0.100137, loss_ce: 0.125295
iteration 10110 : loss : 0.097971, loss_ce: 0.124770
iteration 10120 : loss : 0.117341, loss_ce: 0.159566
iteration 10130 : loss : 0.069816, loss_ce: 0.082422
iteration 10140 : loss : 0.099413, loss_ce: 0.127066
iteration 10150 : loss : 0.086981, loss_ce: 0.100955
iteration 10160 : loss : 0.086797, loss_ce: 0.112914
iteration 10170 : loss : 0.084556, loss_ce: 0.102157
iteration 10180 : loss : 0.082987, loss_ce: 0.104280
iteration 10190 : loss : 0.110369, loss_ce: 0.140087
iteration 10200 : loss : 0.102130, loss_ce: 0.125898
iteration 10210 : loss : 0.096088, loss_ce: 0.112909
iteration 10220 : loss : 0.105262, loss_ce: 0.131818
iteration 10230 : loss : 0.085843, loss_ce: 0.107219
iteration 10240 : loss : 0.082057, loss_ce: 0.097849
iteration 10250 : loss : 0.098409, loss_ce: 0.121208
iteration 10260 : loss : 0.083245, loss_ce: 0.097734
iteration 10270 : loss : 0.094360, loss_ce: 0.121955
iteration 10280 : loss : 0.074336, loss_ce: 0.092272
iteration 10290 : loss : 0.079535, loss_ce: 0.101500
iteration 10300 : loss : 0.089944, loss_ce: 0.105814
iteration 10310 : loss : 0.090200, loss_ce: 0.116979
iteration 10320 : loss : 0.109129, loss_ce: 0.138442
iteration 10330 : loss : 0.095016, loss_ce: 0.114634
iteration 10340 : loss : 0.104925, loss_ce: 0.124371
iteration 10350 : loss : 0.096338, loss_ce: 0.124595
iteration 10360 : loss : 0.095042, loss_ce: 0.123628
iteration 10370 : loss : 0.083595, loss_ce: 0.107781
iteration 10380 : loss : 0.103601, loss_ce: 0.135958
iteration 10390 : loss : 0.083721, loss_ce: 0.108728
iteration 10400 : loss : 0.089104, loss_ce: 0.103338
iteration 10410 : loss : 0.104971, loss_ce: 0.128429
iteration 10420 : loss : 0.087085, loss_ce: 0.104860
iteration 10430 : loss : 0.108133, loss_ce: 0.136051
iteration 10440 : loss : 0.077001, loss_ce: 0.096549
iteration 10450 : loss : 0.080976, loss_ce: 0.089902
iteration 10460 : loss : 0.089394, loss_ce: 0.121811
iteration 10470 : loss : 0.088627, loss_ce: 0.099441
iteration 10480 : loss : 0.080975, loss_ce: 0.089422
iteration 10490 : loss : 0.088444, loss_ce: 0.113619
iteration 10500 : loss : 0.108779, loss_ce: 0.138469
iteration 10510 : loss : 0.091997, loss_ce: 0.111404
iteration 10520 : loss : 0.074586, loss_ce: 0.092828
iteration 10530 : loss : 0.090540, loss_ce: 0.103919
iteration 10540 : loss : 0.097273, loss_ce: 0.124318
iteration 10550 : loss : 0.082164, loss_ce: 0.100188
iteration 10560 : loss : 0.100812, loss_ce: 0.124781
iteration 10570 : loss : 0.096389, loss_ce: 0.115156
iteration 10580 : loss : 0.094088, loss_ce: 0.125667
iteration 10590 : loss : 0.077880, loss_ce: 0.092286
iteration 10600 : loss : 0.100188, loss_ce: 0.135949
iteration 10610 : loss : 0.078762, loss_ce: 0.095084
iteration 10620 : loss : 0.089279, loss_ce: 0.114694
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_84_iter_10625.pth
iteration 10630 : loss : 0.090558, loss_ce: 0.118949
iteration 10640 : loss : 0.083269, loss_ce: 0.097521
iteration 10650 : loss : 0.093595, loss_ce: 0.118557
iteration 10660 : loss : 0.100916, loss_ce: 0.126514
iteration 10670 : loss : 0.101990, loss_ce: 0.127288
iteration 10680 : loss : 0.088390, loss_ce: 0.101923
iteration 10690 : loss : 0.093267, loss_ce: 0.117421
iteration 10700 : loss : 0.080359, loss_ce: 0.103764
iteration 10710 : loss : 0.085384, loss_ce: 0.099614
iteration 10720 : loss : 0.111501, loss_ce: 0.151400
iteration 10730 : loss : 0.062878, loss_ce: 0.070411
iteration 10740 : loss : 0.102747, loss_ce: 0.140500
iteration 10750 : loss : 0.079831, loss_ce: 0.092938
iteration 10760 : loss : 0.081986, loss_ce: 0.103160
iteration 10770 : loss : 0.075200, loss_ce: 0.092950
iteration 10780 : loss : 0.072546, loss_ce: 0.084887
iteration 10790 : loss : 0.075801, loss_ce: 0.090824
iteration 10800 : loss : 0.083515, loss_ce: 0.099006
iteration 10810 : loss : 0.086557, loss_ce: 0.110942
iteration 10820 : loss : 0.061667, loss_ce: 0.072950
iteration 10830 : loss : 0.100267, loss_ce: 0.129681
iteration 10840 : loss : 0.096652, loss_ce: 0.125838
iteration 10850 : loss : 0.097570, loss_ce: 0.124328
iteration 10860 : loss : 0.059794, loss_ce: 0.067547
iteration 10870 : loss : 0.074062, loss_ce: 0.094706
iteration 10880 : loss : 0.088684, loss_ce: 0.113127
iteration 10890 : loss : 0.088183, loss_ce: 0.101332
iteration 10900 : loss : 0.089660, loss_ce: 0.114461
iteration 10910 : loss : 0.084607, loss_ce: 0.102701
iteration 10920 : loss : 0.079589, loss_ce: 0.096138
iteration 10930 : loss : 0.093619, loss_ce: 0.117950
iteration 10940 : loss : 0.090973, loss_ce: 0.113806
iteration 10950 : loss : 0.080635, loss_ce: 0.102636
iteration 10960 : loss : 0.090684, loss_ce: 0.119464
iteration 10970 : loss : 0.099132, loss_ce: 0.126942
iteration 10980 : loss : 0.080786, loss_ce: 0.099243
iteration 10990 : loss : 0.087627, loss_ce: 0.115460
iteration 11000 : loss : 0.080671, loss_ce: 0.098646
iteration 11010 : loss : 0.092052, loss_ce: 0.103192
iteration 11020 : loss : 0.080950, loss_ce: 0.094892
iteration 11030 : loss : 0.098442, loss_ce: 0.118407
iteration 11040 : loss : 0.070013, loss_ce: 0.077410
iteration 11050 : loss : 0.091668, loss_ce: 0.119171
iteration 11060 : loss : 0.078992, loss_ce: 0.083712
iteration 11070 : loss : 0.101224, loss_ce: 0.131268
iteration 11080 : loss : 0.123427, loss_ce: 0.166440
iteration 11090 : loss : 0.085541, loss_ce: 0.106291
iteration 11100 : loss : 0.102423, loss_ce: 0.129037
iteration 11110 : loss : 0.094693, loss_ce: 0.113477
iteration 11120 : loss : 0.079940, loss_ce: 0.095083
iteration 11130 : loss : 0.089560, loss_ce: 0.111504
iteration 11140 : loss : 0.074348, loss_ce: 0.086340
iteration 11150 : loss : 0.083142, loss_ce: 0.100072
iteration 11160 : loss : 0.116917, loss_ce: 0.158619
iteration 11170 : loss : 0.077460, loss_ce: 0.087920
iteration 11180 : loss : 0.085271, loss_ce: 0.104645
iteration 11190 : loss : 0.091485, loss_ce: 0.109674
iteration 11200 : loss : 0.095553, loss_ce: 0.115588
iteration 11210 : loss : 0.110137, loss_ce: 0.140220
iteration 11220 : loss : 0.094743, loss_ce: 0.124572
iteration 11230 : loss : 0.067823, loss_ce: 0.070717
iteration 11240 : loss : 0.078809, loss_ce: 0.090788
iteration 11250 : loss : 0.115123, loss_ce: 0.133817
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_89_iter_11250.pth
iteration 11260 : loss : 0.076460, loss_ce: 0.092722
iteration 11270 : loss : 0.094492, loss_ce: 0.123951
iteration 11280 : loss : 0.090165, loss_ce: 0.095820
iteration 11290 : loss : 0.089512, loss_ce: 0.112169
iteration 11300 : loss : 0.082005, loss_ce: 0.099461
iteration 11310 : loss : 0.067968, loss_ce: 0.078568
iteration 11320 : loss : 0.098727, loss_ce: 0.123269
iteration 11330 : loss : 0.119768, loss_ce: 0.161744
iteration 11340 : loss : 0.102273, loss_ce: 0.129170
iteration 11350 : loss : 0.094023, loss_ce: 0.113973
iteration 11360 : loss : 0.108741, loss_ce: 0.136654
iteration 11370 : loss : 0.088986, loss_ce: 0.118552
iteration 11380 : loss : 0.104332, loss_ce: 0.137684
iteration 11390 : loss : 0.089134, loss_ce: 0.111551
iteration 11400 : loss : 0.092534, loss_ce: 0.109134
iteration 11410 : loss : 0.090189, loss_ce: 0.114946
iteration 11420 : loss : 0.114774, loss_ce: 0.155045
iteration 11430 : loss : 0.078735, loss_ce: 0.098177
iteration 11440 : loss : 0.079606, loss_ce: 0.099655
iteration 11450 : loss : 0.096378, loss_ce: 0.116580
iteration 11460 : loss : 0.068061, loss_ce: 0.080278
iteration 11470 : loss : 0.110208, loss_ce: 0.135865
iteration 11480 : loss : 0.091372, loss_ce: 0.109655
iteration 11490 : loss : 0.070689, loss_ce: 0.083230
iteration 11500 : loss : 0.089794, loss_ce: 0.112576
iteration 11510 : loss : 0.083085, loss_ce: 0.104881
iteration 11520 : loss : 0.082119, loss_ce: 0.101368
iteration 11530 : loss : 0.075580, loss_ce: 0.098832
iteration 11540 : loss : 0.067691, loss_ce: 0.080514
iteration 11550 : loss : 0.083939, loss_ce: 0.097590
iteration 11560 : loss : 0.108015, loss_ce: 0.143257
iteration 11570 : loss : 0.068689, loss_ce: 0.088916
iteration 11580 : loss : 0.089754, loss_ce: 0.114155
iteration 11590 : loss : 0.102284, loss_ce: 0.137633
iteration 11600 : loss : 0.091974, loss_ce: 0.117400
iteration 11610 : loss : 0.099563, loss_ce: 0.125618
iteration 11620 : loss : 0.099891, loss_ce: 0.131221
iteration 11630 : loss : 0.111611, loss_ce: 0.148027
iteration 11640 : loss : 0.064989, loss_ce: 0.074102
iteration 11650 : loss : 0.082138, loss_ce: 0.102657
iteration 11660 : loss : 0.089160, loss_ce: 0.112520
iteration 11670 : loss : 0.081450, loss_ce: 0.099185
iteration 11680 : loss : 0.077117, loss_ce: 0.101262
iteration 11690 : loss : 0.096224, loss_ce: 0.115307
iteration 11700 : loss : 0.098705, loss_ce: 0.134605
iteration 11710 : loss : 0.092430, loss_ce: 0.120290
iteration 11720 : loss : 0.098579, loss_ce: 0.119621
iteration 11730 : loss : 0.107744, loss_ce: 0.132797
iteration 11740 : loss : 0.084913, loss_ce: 0.109981
iteration 11750 : loss : 0.047774, loss_ce: 0.038551
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_93_iter_11750_loss_0.0386.pth with loss 0.0386
Conditional saves: 1/5
iteration 11760 : loss : 0.082700, loss_ce: 0.100009
iteration 11770 : loss : 0.088281, loss_ce: 0.105867
iteration 11780 : loss : 0.087635, loss_ce: 0.106728
iteration 11790 : loss : 0.075213, loss_ce: 0.092850
iteration 11800 : loss : 0.098807, loss_ce: 0.134969
iteration 11810 : loss : 0.081615, loss_ce: 0.098286
iteration 11820 : loss : 0.081933, loss_ce: 0.100278
iteration 11830 : loss : 0.097511, loss_ce: 0.126642
iteration 11840 : loss : 0.098470, loss_ce: 0.125345
iteration 11850 : loss : 0.079349, loss_ce: 0.106394
iteration 11860 : loss : 0.071679, loss_ce: 0.081905
iteration 11870 : loss : 0.069688, loss_ce: 0.083588
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_94_iter_11875.pth
iteration 11880 : loss : 0.088469, loss_ce: 0.117107
iteration 11890 : loss : 0.068057, loss_ce: 0.083024
iteration 11900 : loss : 0.070888, loss_ce: 0.084844
iteration 11910 : loss : 0.090334, loss_ce: 0.120090
iteration 11920 : loss : 0.072408, loss_ce: 0.087756
iteration 11930 : loss : 0.089955, loss_ce: 0.110379
iteration 11940 : loss : 0.086330, loss_ce: 0.111368
iteration 11950 : loss : 0.066208, loss_ce: 0.085121
iteration 11960 : loss : 0.068736, loss_ce: 0.087934
iteration 11970 : loss : 0.090870, loss_ce: 0.122550
iteration 11980 : loss : 0.087185, loss_ce: 0.103385
iteration 11990 : loss : 0.087415, loss_ce: 0.103163
iteration 12000 : loss : 0.065420, loss_ce: 0.074134
iteration 12010 : loss : 0.070713, loss_ce: 0.074345
iteration 12020 : loss : 0.087057, loss_ce: 0.110782
iteration 12030 : loss : 0.073918, loss_ce: 0.093715
iteration 12040 : loss : 0.086427, loss_ce: 0.104631
iteration 12050 : loss : 0.109119, loss_ce: 0.137379
iteration 12060 : loss : 0.106099, loss_ce: 0.145327
iteration 12070 : loss : 0.069988, loss_ce: 0.086772
iteration 12080 : loss : 0.085783, loss_ce: 0.102325
iteration 12090 : loss : 0.091560, loss_ce: 0.118867
iteration 12100 : loss : 0.074647, loss_ce: 0.093215
iteration 12110 : loss : 0.078806, loss_ce: 0.104989
iteration 12120 : loss : 0.077461, loss_ce: 0.095686
iteration 12130 : loss : 0.093177, loss_ce: 0.119625
iteration 12140 : loss : 0.088447, loss_ce: 0.107652
iteration 12150 : loss : 0.065835, loss_ce: 0.078156
iteration 12160 : loss : 0.068652, loss_ce: 0.084298
iteration 12170 : loss : 0.089532, loss_ce: 0.107272
iteration 12180 : loss : 0.074089, loss_ce: 0.091637
iteration 12190 : loss : 0.090211, loss_ce: 0.112051
iteration 12200 : loss : 0.070242, loss_ce: 0.087417
iteration 12210 : loss : 0.083907, loss_ce: 0.105475
iteration 12220 : loss : 0.087965, loss_ce: 0.108537
iteration 12230 : loss : 0.089466, loss_ce: 0.117685
iteration 12240 : loss : 0.083242, loss_ce: 0.104435
iteration 12250 : loss : 0.059809, loss_ce: 0.072841
iteration 12260 : loss : 0.083118, loss_ce: 0.102014
iteration 12270 : loss : 0.113708, loss_ce: 0.153960
iteration 12280 : loss : 0.083514, loss_ce: 0.105209
iteration 12290 : loss : 0.071640, loss_ce: 0.086833
iteration 12300 : loss : 0.078315, loss_ce: 0.103354
iteration 12310 : loss : 0.074823, loss_ce: 0.091586
iteration 12320 : loss : 0.076369, loss_ce: 0.085607
iteration 12330 : loss : 0.083766, loss_ce: 0.106041
iteration 12340 : loss : 0.101057, loss_ce: 0.133599
iteration 12350 : loss : 0.069840, loss_ce: 0.086590
iteration 12360 : loss : 0.062554, loss_ce: 0.070719
iteration 12370 : loss : 0.078965, loss_ce: 0.090023
iteration 12380 : loss : 0.086018, loss_ce: 0.109815
iteration 12390 : loss : 0.080709, loss_ce: 0.092569
iteration 12400 : loss : 0.082216, loss_ce: 0.097621
iteration 12410 : loss : 0.091246, loss_ce: 0.116222
iteration 12420 : loss : 0.081598, loss_ce: 0.094441
iteration 12430 : loss : 0.081870, loss_ce: 0.098607
iteration 12440 : loss : 0.077095, loss_ce: 0.096141
iteration 12450 : loss : 0.076888, loss_ce: 0.093458
iteration 12460 : loss : 0.067264, loss_ce: 0.081370
iteration 12470 : loss : 0.076856, loss_ce: 0.099058
iteration 12480 : loss : 0.075111, loss_ce: 0.088097
iteration 12490 : loss : 0.071864, loss_ce: 0.089658
iteration 12500 : loss : 0.067671, loss_ce: 0.075709
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_99.pth
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_99_iter_12500.pth
iteration 12510 : loss : 0.091604, loss_ce: 0.112574
iteration 12520 : loss : 0.080858, loss_ce: 0.097028
iteration 12530 : loss : 0.067642, loss_ce: 0.079754
iteration 12540 : loss : 0.083018, loss_ce: 0.105054
iteration 12550 : loss : 0.064911, loss_ce: 0.069518
iteration 12560 : loss : 0.075083, loss_ce: 0.089789
iteration 12570 : loss : 0.100051, loss_ce: 0.136471
iteration 12580 : loss : 0.088653, loss_ce: 0.114182
iteration 12590 : loss : 0.102143, loss_ce: 0.120336
iteration 12600 : loss : 0.077700, loss_ce: 0.097184
iteration 12610 : loss : 0.077399, loss_ce: 0.089576
iteration 12620 : loss : 0.119657, loss_ce: 0.156532
iteration 12630 : loss : 0.076791, loss_ce: 0.092089
iteration 12640 : loss : 0.080613, loss_ce: 0.103842
iteration 12650 : loss : 0.080024, loss_ce: 0.091324
iteration 12660 : loss : 0.093887, loss_ce: 0.118859
iteration 12670 : loss : 0.088219, loss_ce: 0.108480
iteration 12680 : loss : 0.068083, loss_ce: 0.077909
iteration 12690 : loss : 0.073823, loss_ce: 0.090869
iteration 12700 : loss : 0.100798, loss_ce: 0.128519
iteration 12710 : loss : 0.085383, loss_ce: 0.102419
iteration 12720 : loss : 0.084604, loss_ce: 0.107944
iteration 12730 : loss : 0.087594, loss_ce: 0.108994
iteration 12740 : loss : 0.104040, loss_ce: 0.130716
iteration 12750 : loss : 0.103179, loss_ce: 0.125410
iteration 12760 : loss : 0.110974, loss_ce: 0.142016
iteration 12770 : loss : 0.079097, loss_ce: 0.095289
iteration 12780 : loss : 0.075879, loss_ce: 0.099283
iteration 12790 : loss : 0.087107, loss_ce: 0.098488
iteration 12800 : loss : 0.069887, loss_ce: 0.090767
iteration 12810 : loss : 0.074620, loss_ce: 0.089841
iteration 12820 : loss : 0.084283, loss_ce: 0.113862
iteration 12830 : loss : 0.076974, loss_ce: 0.093685
iteration 12840 : loss : 0.073293, loss_ce: 0.088634
iteration 12850 : loss : 0.089649, loss_ce: 0.116066
iteration 12860 : loss : 0.102183, loss_ce: 0.132431
iteration 12870 : loss : 0.079233, loss_ce: 0.095809
iteration 12880 : loss : 0.091511, loss_ce: 0.112347
iteration 12890 : loss : 0.080753, loss_ce: 0.107345
iteration 12900 : loss : 0.091066, loss_ce: 0.116444
iteration 12910 : loss : 0.083637, loss_ce: 0.109059
iteration 12920 : loss : 0.079213, loss_ce: 0.101717
iteration 12930 : loss : 0.072135, loss_ce: 0.084839
iteration 12940 : loss : 0.077919, loss_ce: 0.097323
iteration 12950 : loss : 0.111424, loss_ce: 0.148691
iteration 12960 : loss : 0.096674, loss_ce: 0.121298
iteration 12970 : loss : 0.079037, loss_ce: 0.099750
iteration 12980 : loss : 0.079982, loss_ce: 0.099294
iteration 12990 : loss : 0.083006, loss_ce: 0.108390
iteration 13000 : loss : 0.112835, loss_ce: 0.148532
iteration 13010 : loss : 0.078526, loss_ce: 0.096007
iteration 13020 : loss : 0.097810, loss_ce: 0.128785
iteration 13030 : loss : 0.073015, loss_ce: 0.092574
iteration 13040 : loss : 0.075884, loss_ce: 0.095822
iteration 13050 : loss : 0.098095, loss_ce: 0.124533
iteration 13060 : loss : 0.091912, loss_ce: 0.114126
iteration 13070 : loss : 0.078360, loss_ce: 0.101500
iteration 13080 : loss : 0.087783, loss_ce: 0.118460
iteration 13090 : loss : 0.079688, loss_ce: 0.092994
iteration 13100 : loss : 0.069065, loss_ce: 0.078173
iteration 13110 : loss : 0.083686, loss_ce: 0.112095
iteration 13120 : loss : 0.077522, loss_ce: 0.097940
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_104_iter_13125.pth
iteration 13130 : loss : 0.076080, loss_ce: 0.097451
iteration 13140 : loss : 0.080401, loss_ce: 0.100948
iteration 13150 : loss : 0.079980, loss_ce: 0.095230
iteration 13160 : loss : 0.081332, loss_ce: 0.091319
iteration 13170 : loss : 0.075119, loss_ce: 0.091671
iteration 13180 : loss : 0.075118, loss_ce: 0.080302
iteration 13190 : loss : 0.094199, loss_ce: 0.121107
iteration 13200 : loss : 0.086383, loss_ce: 0.102196
iteration 13210 : loss : 0.077313, loss_ce: 0.101176
iteration 13220 : loss : 0.085954, loss_ce: 0.111914
iteration 13230 : loss : 0.097854, loss_ce: 0.123903
iteration 13240 : loss : 0.080208, loss_ce: 0.080873
iteration 13250 : loss : 0.087243, loss_ce: 0.090591
iteration 13260 : loss : 0.078367, loss_ce: 0.096139
iteration 13270 : loss : 0.076000, loss_ce: 0.100595
iteration 13280 : loss : 0.074087, loss_ce: 0.087193
iteration 13290 : loss : 0.078321, loss_ce: 0.100031
iteration 13300 : loss : 0.106401, loss_ce: 0.143581
iteration 13310 : loss : 0.073916, loss_ce: 0.082604
iteration 13320 : loss : 0.065462, loss_ce: 0.082392
iteration 13330 : loss : 0.083698, loss_ce: 0.112243
iteration 13340 : loss : 0.085339, loss_ce: 0.101729
iteration 13350 : loss : 0.088509, loss_ce: 0.116070
iteration 13360 : loss : 0.073879, loss_ce: 0.092644
iteration 13370 : loss : 0.086599, loss_ce: 0.112790
iteration 13380 : loss : 0.086469, loss_ce: 0.108806
iteration 13390 : loss : 0.085631, loss_ce: 0.116202
iteration 13400 : loss : 0.074630, loss_ce: 0.090911
iteration 13410 : loss : 0.089395, loss_ce: 0.116843
iteration 13420 : loss : 0.078327, loss_ce: 0.101590
iteration 13430 : loss : 0.071905, loss_ce: 0.089282
iteration 13440 : loss : 0.081408, loss_ce: 0.100577
iteration 13450 : loss : 0.095653, loss_ce: 0.127965
iteration 13460 : loss : 0.081476, loss_ce: 0.094355
iteration 13470 : loss : 0.074133, loss_ce: 0.096287
iteration 13480 : loss : 0.090058, loss_ce: 0.110406
iteration 13490 : loss : 0.070669, loss_ce: 0.072582
iteration 13500 : loss : 0.063378, loss_ce: 0.078044
iteration 13510 : loss : 0.066779, loss_ce: 0.081963
iteration 13520 : loss : 0.092675, loss_ce: 0.108896
iteration 13530 : loss : 0.081395, loss_ce: 0.108206
iteration 13540 : loss : 0.072280, loss_ce: 0.090941
iteration 13550 : loss : 0.076538, loss_ce: 0.082750
iteration 13560 : loss : 0.081285, loss_ce: 0.105818
iteration 13570 : loss : 0.072298, loss_ce: 0.085791
iteration 13580 : loss : 0.088708, loss_ce: 0.111470
iteration 13590 : loss : 0.093925, loss_ce: 0.121137
iteration 13600 : loss : 0.080012, loss_ce: 0.104064
iteration 13610 : loss : 0.070577, loss_ce: 0.089196
iteration 13620 : loss : 0.084377, loss_ce: 0.105647
iteration 13630 : loss : 0.086326, loss_ce: 0.112992
iteration 13640 : loss : 0.065167, loss_ce: 0.080592
iteration 13650 : loss : 0.097905, loss_ce: 0.130878
iteration 13660 : loss : 0.068573, loss_ce: 0.076573
iteration 13670 : loss : 0.082859, loss_ce: 0.101440
iteration 13680 : loss : 0.079868, loss_ce: 0.101296
iteration 13690 : loss : 0.069865, loss_ce: 0.090063
iteration 13700 : loss : 0.088701, loss_ce: 0.107368
iteration 13710 : loss : 0.089346, loss_ce: 0.115352
iteration 13720 : loss : 0.100332, loss_ce: 0.138294
iteration 13730 : loss : 0.081328, loss_ce: 0.104212
iteration 13740 : loss : 0.086116, loss_ce: 0.111951
iteration 13750 : loss : 0.072933, loss_ce: 0.091049
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_109_iter_13750.pth
iteration 13760 : loss : 0.068574, loss_ce: 0.091691
iteration 13770 : loss : 0.069228, loss_ce: 0.087302
iteration 13780 : loss : 0.072923, loss_ce: 0.093319
iteration 13790 : loss : 0.081341, loss_ce: 0.105163
iteration 13800 : loss : 0.079497, loss_ce: 0.103709
iteration 13810 : loss : 0.103649, loss_ce: 0.140824
iteration 13820 : loss : 0.078947, loss_ce: 0.093161
iteration 13830 : loss : 0.088547, loss_ce: 0.108459
iteration 13840 : loss : 0.075504, loss_ce: 0.093772
iteration 13850 : loss : 0.063011, loss_ce: 0.077518
iteration 13860 : loss : 0.081176, loss_ce: 0.099903
iteration 13870 : loss : 0.072302, loss_ce: 0.089912
iteration 13880 : loss : 0.076601, loss_ce: 0.098308
iteration 13890 : loss : 0.100533, loss_ce: 0.132192
iteration 13900 : loss : 0.072248, loss_ce: 0.087292
iteration 13910 : loss : 0.081968, loss_ce: 0.106867
iteration 13920 : loss : 0.082385, loss_ce: 0.096020
iteration 13930 : loss : 0.095399, loss_ce: 0.118259
iteration 13940 : loss : 0.061127, loss_ce: 0.074411
iteration 13950 : loss : 0.075998, loss_ce: 0.093843
iteration 13960 : loss : 0.088603, loss_ce: 0.118795
iteration 13970 : loss : 0.083428, loss_ce: 0.110330
iteration 13980 : loss : 0.073781, loss_ce: 0.090998
iteration 13990 : loss : 0.082209, loss_ce: 0.109261
iteration 14000 : loss : 0.087364, loss_ce: 0.103561
iteration 14010 : loss : 0.088210, loss_ce: 0.108774
iteration 14020 : loss : 0.079229, loss_ce: 0.095102
iteration 14030 : loss : 0.075347, loss_ce: 0.101464
iteration 14040 : loss : 0.069196, loss_ce: 0.085789
iteration 14050 : loss : 0.084059, loss_ce: 0.108895
iteration 14060 : loss : 0.084066, loss_ce: 0.103917
iteration 14070 : loss : 0.063043, loss_ce: 0.081362
iteration 14080 : loss : 0.075412, loss_ce: 0.093615
iteration 14090 : loss : 0.108843, loss_ce: 0.143167
iteration 14100 : loss : 0.068854, loss_ce: 0.073916
iteration 14110 : loss : 0.073386, loss_ce: 0.091194
iteration 14120 : loss : 0.087586, loss_ce: 0.115004
iteration 14130 : loss : 0.068600, loss_ce: 0.081415
iteration 14140 : loss : 0.065935, loss_ce: 0.074853
iteration 14150 : loss : 0.091127, loss_ce: 0.112461
iteration 14160 : loss : 0.076716, loss_ce: 0.092399
iteration 14170 : loss : 0.080469, loss_ce: 0.109125
iteration 14180 : loss : 0.079411, loss_ce: 0.100979
iteration 14190 : loss : 0.086689, loss_ce: 0.110011
iteration 14200 : loss : 0.085816, loss_ce: 0.100966
iteration 14210 : loss : 0.075964, loss_ce: 0.096756
iteration 14220 : loss : 0.086801, loss_ce: 0.112275
iteration 14230 : loss : 0.081662, loss_ce: 0.091802
iteration 14240 : loss : 0.086711, loss_ce: 0.110569
iteration 14250 : loss : 0.049347, loss_ce: 0.057496
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_113_iter_14250_loss_0.0575.pth with loss 0.0575
Conditional saves: 2/5
iteration 14260 : loss : 0.071971, loss_ce: 0.082486
iteration 14270 : loss : 0.079023, loss_ce: 0.099505
iteration 14280 : loss : 0.082319, loss_ce: 0.101472
iteration 14290 : loss : 0.083469, loss_ce: 0.102463
iteration 14300 : loss : 0.089528, loss_ce: 0.114964
iteration 14310 : loss : 0.082886, loss_ce: 0.112862
iteration 14320 : loss : 0.079596, loss_ce: 0.103932
iteration 14330 : loss : 0.078480, loss_ce: 0.093651
iteration 14340 : loss : 0.072953, loss_ce: 0.089849
iteration 14350 : loss : 0.069189, loss_ce: 0.081788
iteration 14360 : loss : 0.093323, loss_ce: 0.123572
iteration 14370 : loss : 0.073020, loss_ce: 0.086416
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_114_iter_14375.pth
iteration 14380 : loss : 0.065717, loss_ce: 0.078660
iteration 14390 : loss : 0.079913, loss_ce: 0.109651
iteration 14400 : loss : 0.066498, loss_ce: 0.080978
iteration 14410 : loss : 0.072695, loss_ce: 0.088148
iteration 14420 : loss : 0.085433, loss_ce: 0.105911
iteration 14430 : loss : 0.104175, loss_ce: 0.136701
iteration 14440 : loss : 0.070047, loss_ce: 0.091244
iteration 14450 : loss : 0.083281, loss_ce: 0.106023
iteration 14460 : loss : 0.101686, loss_ce: 0.131676
iteration 14470 : loss : 0.070958, loss_ce: 0.090802
iteration 14480 : loss : 0.084392, loss_ce: 0.108254
iteration 14490 : loss : 0.077885, loss_ce: 0.096392
iteration 14500 : loss : 0.058583, loss_ce: 0.066584
iteration 14510 : loss : 0.075205, loss_ce: 0.092766
iteration 14520 : loss : 0.068973, loss_ce: 0.088095
iteration 14530 : loss : 0.078476, loss_ce: 0.101653
iteration 14540 : loss : 0.071260, loss_ce: 0.089284
iteration 14550 : loss : 0.100927, loss_ce: 0.132965
iteration 14560 : loss : 0.073963, loss_ce: 0.080122
iteration 14570 : loss : 0.074857, loss_ce: 0.089627
iteration 14580 : loss : 0.084312, loss_ce: 0.107423
iteration 14590 : loss : 0.081705, loss_ce: 0.105102
iteration 14600 : loss : 0.071354, loss_ce: 0.085597
iteration 14610 : loss : 0.082111, loss_ce: 0.106980
iteration 14620 : loss : 0.076504, loss_ce: 0.087664
iteration 14630 : loss : 0.078205, loss_ce: 0.093937
iteration 14640 : loss : 0.072739, loss_ce: 0.086558
iteration 14650 : loss : 0.078821, loss_ce: 0.098204
iteration 14660 : loss : 0.067640, loss_ce: 0.069677
iteration 14670 : loss : 0.074344, loss_ce: 0.094658
iteration 14680 : loss : 0.081211, loss_ce: 0.104493
iteration 14690 : loss : 0.086075, loss_ce: 0.115049
iteration 14700 : loss : 0.086286, loss_ce: 0.109777
iteration 14710 : loss : 0.075885, loss_ce: 0.098839
iteration 14720 : loss : 0.098651, loss_ce: 0.124050
iteration 14730 : loss : 0.057483, loss_ce: 0.067306
iteration 14740 : loss : 0.075317, loss_ce: 0.090686
iteration 14750 : loss : 0.091727, loss_ce: 0.110595
iteration 14760 : loss : 0.121351, loss_ce: 0.161820
iteration 14770 : loss : 0.087814, loss_ce: 0.111421
iteration 14780 : loss : 0.076699, loss_ce: 0.097760
iteration 14790 : loss : 0.083806, loss_ce: 0.097620
iteration 14800 : loss : 0.078041, loss_ce: 0.096485
iteration 14810 : loss : 0.083193, loss_ce: 0.107235
iteration 14820 : loss : 0.121902, loss_ce: 0.162512
iteration 14830 : loss : 0.086422, loss_ce: 0.113207
iteration 14840 : loss : 0.065961, loss_ce: 0.081879
iteration 14850 : loss : 0.080000, loss_ce: 0.096383
iteration 14860 : loss : 0.074028, loss_ce: 0.085303
iteration 14870 : loss : 0.090356, loss_ce: 0.115903
iteration 14880 : loss : 0.080790, loss_ce: 0.107802
iteration 14890 : loss : 0.084036, loss_ce: 0.107512
iteration 14900 : loss : 0.065017, loss_ce: 0.083338
iteration 14910 : loss : 0.081438, loss_ce: 0.104070
iteration 14920 : loss : 0.057423, loss_ce: 0.068592
iteration 14930 : loss : 0.081324, loss_ce: 0.107766
iteration 14940 : loss : 0.079879, loss_ce: 0.099367
iteration 14950 : loss : 0.076155, loss_ce: 0.098126
iteration 14960 : loss : 0.079611, loss_ce: 0.103900
iteration 14970 : loss : 0.082569, loss_ce: 0.108104
iteration 14980 : loss : 0.083727, loss_ce: 0.109008
iteration 14990 : loss : 0.068356, loss_ce: 0.080624
iteration 15000 : loss : 0.063686, loss_ce: 0.080348
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_119_iter_15000.pth
iteration 15010 : loss : 0.071395, loss_ce: 0.083239
iteration 15020 : loss : 0.076610, loss_ce: 0.091971
iteration 15030 : loss : 0.078181, loss_ce: 0.109859
iteration 15040 : loss : 0.070185, loss_ce: 0.083243
iteration 15050 : loss : 0.088062, loss_ce: 0.108253
iteration 15060 : loss : 0.079001, loss_ce: 0.105733
iteration 15070 : loss : 0.068750, loss_ce: 0.082379
iteration 15080 : loss : 0.080243, loss_ce: 0.105815
iteration 15090 : loss : 0.101954, loss_ce: 0.124451
iteration 15100 : loss : 0.077393, loss_ce: 0.097110
iteration 15110 : loss : 0.074281, loss_ce: 0.087190
iteration 15120 : loss : 0.067351, loss_ce: 0.081965
iteration 15130 : loss : 0.082168, loss_ce: 0.105844
iteration 15140 : loss : 0.070641, loss_ce: 0.084681
iteration 15150 : loss : 0.084687, loss_ce: 0.112250
iteration 15160 : loss : 0.081784, loss_ce: 0.101141
iteration 15170 : loss : 0.077011, loss_ce: 0.096355
iteration 15180 : loss : 0.086750, loss_ce: 0.113147
iteration 15190 : loss : 0.089542, loss_ce: 0.112384
iteration 15200 : loss : 0.069373, loss_ce: 0.084709
iteration 15210 : loss : 0.063763, loss_ce: 0.075779
iteration 15220 : loss : 0.065997, loss_ce: 0.074044
iteration 15230 : loss : 0.079918, loss_ce: 0.101917
iteration 15240 : loss : 0.081125, loss_ce: 0.103686
iteration 15250 : loss : 0.031178, loss_ce: 0.037524
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_121_iter_15250_loss_0.0375.pth with loss 0.0375
Conditional saves: 3/5
iteration 15260 : loss : 0.063019, loss_ce: 0.070607
iteration 15270 : loss : 0.061046, loss_ce: 0.075588
iteration 15280 : loss : 0.066345, loss_ce: 0.084912
iteration 15290 : loss : 0.069692, loss_ce: 0.086982
iteration 15300 : loss : 0.077010, loss_ce: 0.094507
iteration 15310 : loss : 0.076009, loss_ce: 0.098453
iteration 15320 : loss : 0.062408, loss_ce: 0.073404
iteration 15330 : loss : 0.082134, loss_ce: 0.100568
iteration 15340 : loss : 0.086550, loss_ce: 0.112707
iteration 15350 : loss : 0.083372, loss_ce: 0.099299
iteration 15360 : loss : 0.079605, loss_ce: 0.096730
iteration 15370 : loss : 0.078322, loss_ce: 0.094152
iteration 15380 : loss : 0.071745, loss_ce: 0.085351
iteration 15390 : loss : 0.082261, loss_ce: 0.104789
iteration 15400 : loss : 0.081895, loss_ce: 0.107649
iteration 15410 : loss : 0.073021, loss_ce: 0.095800
iteration 15420 : loss : 0.095408, loss_ce: 0.124449
iteration 15430 : loss : 0.071198, loss_ce: 0.092913
iteration 15440 : loss : 0.063119, loss_ce: 0.071228
iteration 15450 : loss : 0.084144, loss_ce: 0.103679
iteration 15460 : loss : 0.060154, loss_ce: 0.068150
iteration 15470 : loss : 0.075783, loss_ce: 0.095213
iteration 15480 : loss : 0.077137, loss_ce: 0.099912
iteration 15490 : loss : 0.085881, loss_ce: 0.113985
iteration 15500 : loss : 0.077056, loss_ce: 0.090835
iteration 15510 : loss : 0.068085, loss_ce: 0.082264
iteration 15520 : loss : 0.081339, loss_ce: 0.105677
iteration 15530 : loss : 0.072425, loss_ce: 0.093662
iteration 15540 : loss : 0.073568, loss_ce: 0.092835
iteration 15550 : loss : 0.073786, loss_ce: 0.094979
iteration 15560 : loss : 0.054554, loss_ce: 0.063154
iteration 15570 : loss : 0.072107, loss_ce: 0.085207
iteration 15580 : loss : 0.083706, loss_ce: 0.113725
iteration 15590 : loss : 0.066680, loss_ce: 0.083393
iteration 15600 : loss : 0.068632, loss_ce: 0.092501
iteration 15610 : loss : 0.089414, loss_ce: 0.101280
iteration 15620 : loss : 0.097350, loss_ce: 0.129806
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_124_iter_15625.pth
iteration 15630 : loss : 0.078277, loss_ce: 0.098937
iteration 15640 : loss : 0.071265, loss_ce: 0.084227
iteration 15650 : loss : 0.080044, loss_ce: 0.098096
iteration 15660 : loss : 0.070135, loss_ce: 0.086053
iteration 15670 : loss : 0.051844, loss_ce: 0.061935
iteration 15680 : loss : 0.063612, loss_ce: 0.080070
iteration 15690 : loss : 0.064795, loss_ce: 0.073956
iteration 15700 : loss : 0.074956, loss_ce: 0.091330
iteration 15710 : loss : 0.079007, loss_ce: 0.098841
iteration 15720 : loss : 0.058363, loss_ce: 0.067923
iteration 15730 : loss : 0.067211, loss_ce: 0.083316
iteration 15740 : loss : 0.070674, loss_ce: 0.083733
iteration 15750 : loss : 0.099530, loss_ce: 0.132252
iteration 15760 : loss : 0.087506, loss_ce: 0.110313
iteration 15770 : loss : 0.075853, loss_ce: 0.093097
iteration 15780 : loss : 0.080986, loss_ce: 0.104827
iteration 15790 : loss : 0.062771, loss_ce: 0.073226
iteration 15800 : loss : 0.072269, loss_ce: 0.090761
iteration 15810 : loss : 0.082629, loss_ce: 0.098851
iteration 15820 : loss : 0.068163, loss_ce: 0.083016
iteration 15830 : loss : 0.065519, loss_ce: 0.079041
iteration 15840 : loss : 0.059862, loss_ce: 0.070780
iteration 15850 : loss : 0.057315, loss_ce: 0.067131
iteration 15860 : loss : 0.069210, loss_ce: 0.084106
iteration 15870 : loss : 0.085413, loss_ce: 0.119379
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_126_iter_15875_loss_0.0559.pth with loss 0.0559
Conditional saves: 4/5
iteration 15880 : loss : 0.064312, loss_ce: 0.077805
iteration 15890 : loss : 0.072307, loss_ce: 0.080689
iteration 15900 : loss : 0.070488, loss_ce: 0.085355
iteration 15910 : loss : 0.077878, loss_ce: 0.099484
iteration 15920 : loss : 0.077321, loss_ce: 0.096044
iteration 15930 : loss : 0.073021, loss_ce: 0.094445
iteration 15940 : loss : 0.075062, loss_ce: 0.094754
iteration 15950 : loss : 0.085101, loss_ce: 0.118026
iteration 15960 : loss : 0.070664, loss_ce: 0.088178
iteration 15970 : loss : 0.079524, loss_ce: 0.104053
iteration 15980 : loss : 0.091565, loss_ce: 0.122311
iteration 15990 : loss : 0.097026, loss_ce: 0.122000
iteration 16000 : loss : 0.062511, loss_ce: 0.081225
iteration 16010 : loss : 0.069463, loss_ce: 0.089466
iteration 16020 : loss : 0.069333, loss_ce: 0.084704
iteration 16030 : loss : 0.076774, loss_ce: 0.095717
iteration 16040 : loss : 0.080158, loss_ce: 0.101848
iteration 16050 : loss : 0.071926, loss_ce: 0.090475
iteration 16060 : loss : 0.101108, loss_ce: 0.133768
iteration 16070 : loss : 0.071527, loss_ce: 0.088847
iteration 16080 : loss : 0.070530, loss_ce: 0.087550
iteration 16090 : loss : 0.081852, loss_ce: 0.101715
iteration 16100 : loss : 0.085267, loss_ce: 0.114147
iteration 16110 : loss : 0.071509, loss_ce: 0.091819
iteration 16120 : loss : 0.079715, loss_ce: 0.099369
iteration 16130 : loss : 0.077900, loss_ce: 0.097268
iteration 16140 : loss : 0.069490, loss_ce: 0.078500
iteration 16150 : loss : 0.085180, loss_ce: 0.111757
iteration 16160 : loss : 0.071789, loss_ce: 0.089646
iteration 16170 : loss : 0.070035, loss_ce: 0.090222
iteration 16180 : loss : 0.077523, loss_ce: 0.095829
iteration 16190 : loss : 0.057844, loss_ce: 0.068635
iteration 16200 : loss : 0.081166, loss_ce: 0.086943
iteration 16210 : loss : 0.064908, loss_ce: 0.063012
iteration 16220 : loss : 0.092815, loss_ce: 0.123291
iteration 16230 : loss : 0.067064, loss_ce: 0.082876
iteration 16240 : loss : 0.076006, loss_ce: 0.095594
iteration 16250 : loss : 0.052363, loss_ce: 0.057333
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_129_iter_16250.pth
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_129_iter_16250_loss_0.0573.pth with loss 0.0573
Conditional saves: 5/5
iteration 16260 : loss : 0.090450, loss_ce: 0.122334
iteration 16270 : loss : 0.076960, loss_ce: 0.097848
iteration 16280 : loss : 0.073675, loss_ce: 0.099683
iteration 16290 : loss : 0.064167, loss_ce: 0.081372
iteration 16300 : loss : 0.079101, loss_ce: 0.098748
iteration 16310 : loss : 0.072562, loss_ce: 0.087485
iteration 16320 : loss : 0.067947, loss_ce: 0.085817
iteration 16330 : loss : 0.073566, loss_ce: 0.092224
iteration 16340 : loss : 0.088282, loss_ce: 0.118619
iteration 16350 : loss : 0.087615, loss_ce: 0.110200
iteration 16360 : loss : 0.059736, loss_ce: 0.065002
iteration 16370 : loss : 0.082859, loss_ce: 0.108371
iteration 16380 : loss : 0.082155, loss_ce: 0.107240
iteration 16390 : loss : 0.074728, loss_ce: 0.092198
iteration 16400 : loss : 0.058619, loss_ce: 0.074204
iteration 16410 : loss : 0.087867, loss_ce: 0.121033
iteration 16420 : loss : 0.056309, loss_ce: 0.067510
iteration 16430 : loss : 0.069594, loss_ce: 0.091124
iteration 16440 : loss : 0.084190, loss_ce: 0.104663
iteration 16450 : loss : 0.064139, loss_ce: 0.077695
iteration 16460 : loss : 0.069817, loss_ce: 0.084187
iteration 16470 : loss : 0.068347, loss_ce: 0.082028
iteration 16480 : loss : 0.079070, loss_ce: 0.101195
iteration 16490 : loss : 0.088625, loss_ce: 0.119256
iteration 16500 : loss : 0.088176, loss_ce: 0.090239
iteration 16510 : loss : 0.090351, loss_ce: 0.109413
iteration 16520 : loss : 0.079274, loss_ce: 0.102035
iteration 16530 : loss : 0.076083, loss_ce: 0.084852
iteration 16540 : loss : 0.061296, loss_ce: 0.074215
iteration 16550 : loss : 0.081309, loss_ce: 0.111031
iteration 16560 : loss : 0.072965, loss_ce: 0.088756
iteration 16570 : loss : 0.066555, loss_ce: 0.091359
iteration 16580 : loss : 0.072205, loss_ce: 0.093041
iteration 16590 : loss : 0.065513, loss_ce: 0.078209
iteration 16600 : loss : 0.085411, loss_ce: 0.102692
iteration 16610 : loss : 0.064296, loss_ce: 0.079511
iteration 16620 : loss : 0.067553, loss_ce: 0.078948
iteration 16630 : loss : 0.076920, loss_ce: 0.102552
iteration 16640 : loss : 0.070780, loss_ce: 0.088133
iteration 16650 : loss : 0.074211, loss_ce: 0.092237
iteration 16660 : loss : 0.107039, loss_ce: 0.142382
iteration 16670 : loss : 0.081193, loss_ce: 0.100365
iteration 16680 : loss : 0.057706, loss_ce: 0.067899
iteration 16690 : loss : 0.080431, loss_ce: 0.103781
iteration 16700 : loss : 0.072207, loss_ce: 0.085820
iteration 16710 : loss : 0.055457, loss_ce: 0.063716
iteration 16720 : loss : 0.059778, loss_ce: 0.073245
iteration 16730 : loss : 0.077537, loss_ce: 0.103537
iteration 16740 : loss : 0.073042, loss_ce: 0.095468
iteration 16750 : loss : 0.093749, loss_ce: 0.120619
iteration 16760 : loss : 0.071946, loss_ce: 0.087619
iteration 16770 : loss : 0.061884, loss_ce: 0.078259
iteration 16780 : loss : 0.073942, loss_ce: 0.091815
iteration 16790 : loss : 0.079029, loss_ce: 0.098516
iteration 16800 : loss : 0.062238, loss_ce: 0.074243
iteration 16810 : loss : 0.083456, loss_ce: 0.108164
iteration 16820 : loss : 0.085188, loss_ce: 0.117432
iteration 16830 : loss : 0.065460, loss_ce: 0.081377
iteration 16840 : loss : 0.079232, loss_ce: 0.099849
iteration 16850 : loss : 0.076382, loss_ce: 0.100472
iteration 16860 : loss : 0.072158, loss_ce: 0.087083
iteration 16870 : loss : 0.070404, loss_ce: 0.091605
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_134_iter_16875.pth
iteration 16880 : loss : 0.073596, loss_ce: 0.096028
iteration 16890 : loss : 0.081805, loss_ce: 0.107547
iteration 16900 : loss : 0.059094, loss_ce: 0.068931
iteration 16910 : loss : 0.056155, loss_ce: 0.068726
iteration 16920 : loss : 0.066927, loss_ce: 0.081238
iteration 16930 : loss : 0.060794, loss_ce: 0.073375
iteration 16940 : loss : 0.069608, loss_ce: 0.088202
iteration 16950 : loss : 0.068802, loss_ce: 0.085903
iteration 16960 : loss : 0.076564, loss_ce: 0.097135
iteration 16970 : loss : 0.080035, loss_ce: 0.104717
iteration 16980 : loss : 0.089211, loss_ce: 0.114922
iteration 16990 : loss : 0.064368, loss_ce: 0.080654
iteration 17000 : loss : 0.109816, loss_ce: 0.138232
iteration 17010 : loss : 0.086727, loss_ce: 0.109704
iteration 17020 : loss : 0.074904, loss_ce: 0.085000
iteration 17030 : loss : 0.090119, loss_ce: 0.115021
iteration 17040 : loss : 0.070366, loss_ce: 0.084179
iteration 17050 : loss : 0.090135, loss_ce: 0.116310
iteration 17060 : loss : 0.070016, loss_ce: 0.077740
iteration 17070 : loss : 0.076817, loss_ce: 0.097838
iteration 17080 : loss : 0.069924, loss_ce: 0.082485
iteration 17090 : loss : 0.061130, loss_ce: 0.083144
iteration 17100 : loss : 0.062172, loss_ce: 0.077487
iteration 17110 : loss : 0.070105, loss_ce: 0.090639
iteration 17120 : loss : 0.092259, loss_ce: 0.123223
iteration 17130 : loss : 0.068650, loss_ce: 0.081088
iteration 17140 : loss : 0.071598, loss_ce: 0.088545
iteration 17150 : loss : 0.056652, loss_ce: 0.071015
iteration 17160 : loss : 0.061712, loss_ce: 0.069552
iteration 17170 : loss : 0.074872, loss_ce: 0.098796
iteration 17180 : loss : 0.070148, loss_ce: 0.094443
iteration 17190 : loss : 0.069050, loss_ce: 0.086412
iteration 17200 : loss : 0.076936, loss_ce: 0.091260
iteration 17210 : loss : 0.083314, loss_ce: 0.106888
iteration 17220 : loss : 0.071927, loss_ce: 0.088480
iteration 17230 : loss : 0.072796, loss_ce: 0.089065
iteration 17240 : loss : 0.079566, loss_ce: 0.098558
iteration 17250 : loss : 0.052436, loss_ce: 0.039307
iteration 17260 : loss : 0.077574, loss_ce: 0.098248
iteration 17270 : loss : 0.068708, loss_ce: 0.090208
iteration 17280 : loss : 0.065245, loss_ce: 0.082967
iteration 17290 : loss : 0.070799, loss_ce: 0.088387
iteration 17300 : loss : 0.068121, loss_ce: 0.069396
iteration 17310 : loss : 0.081893, loss_ce: 0.109019
iteration 17320 : loss : 0.066271, loss_ce: 0.079089
iteration 17330 : loss : 0.092328, loss_ce: 0.129818
iteration 17340 : loss : 0.082630, loss_ce: 0.110019
iteration 17350 : loss : 0.059502, loss_ce: 0.075682
iteration 17360 : loss : 0.066877, loss_ce: 0.085431
iteration 17370 : loss : 0.076240, loss_ce: 0.091963
iteration 17380 : loss : 0.074108, loss_ce: 0.096293
iteration 17390 : loss : 0.062587, loss_ce: 0.069185
iteration 17400 : loss : 0.077814, loss_ce: 0.096099
iteration 17410 : loss : 0.078501, loss_ce: 0.103322
iteration 17420 : loss : 0.061974, loss_ce: 0.071042
iteration 17430 : loss : 0.050524, loss_ce: 0.062221
iteration 17440 : loss : 0.074466, loss_ce: 0.092959
iteration 17450 : loss : 0.083704, loss_ce: 0.109258
iteration 17460 : loss : 0.068445, loss_ce: 0.089581
iteration 17470 : loss : 0.075008, loss_ce: 0.092849
iteration 17480 : loss : 0.065318, loss_ce: 0.081877
iteration 17490 : loss : 0.070932, loss_ce: 0.087987
iteration 17500 : loss : 0.088033, loss_ce: 0.128012
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_139_iter_17500.pth
iteration 17510 : loss : 0.061275, loss_ce: 0.073368
iteration 17520 : loss : 0.082207, loss_ce: 0.102263
iteration 17530 : loss : 0.074686, loss_ce: 0.092685
iteration 17540 : loss : 0.051889, loss_ce: 0.057575
iteration 17550 : loss : 0.056678, loss_ce: 0.060270
iteration 17560 : loss : 0.065844, loss_ce: 0.080607
iteration 17570 : loss : 0.067721, loss_ce: 0.081643
iteration 17580 : loss : 0.068742, loss_ce: 0.085663
iteration 17590 : loss : 0.072593, loss_ce: 0.098437
iteration 17600 : loss : 0.069765, loss_ce: 0.090485
iteration 17610 : loss : 0.068774, loss_ce: 0.078169
iteration 17620 : loss : 0.066837, loss_ce: 0.085241
iteration 17630 : loss : 0.078926, loss_ce: 0.093099
iteration 17640 : loss : 0.073977, loss_ce: 0.093642
iteration 17650 : loss : 0.065057, loss_ce: 0.079938
iteration 17660 : loss : 0.085342, loss_ce: 0.111269
iteration 17670 : loss : 0.073940, loss_ce: 0.094802
iteration 17680 : loss : 0.075529, loss_ce: 0.097296
iteration 17690 : loss : 0.082665, loss_ce: 0.106524
iteration 17700 : loss : 0.066663, loss_ce: 0.082415
iteration 17710 : loss : 0.078109, loss_ce: 0.096835
iteration 17720 : loss : 0.067800, loss_ce: 0.075529
iteration 17730 : loss : 0.067006, loss_ce: 0.087943
iteration 17740 : loss : 0.075980, loss_ce: 0.093012
iteration 17750 : loss : 0.090976, loss_ce: 0.125770
iteration 17760 : loss : 0.059808, loss_ce: 0.076045
iteration 17770 : loss : 0.075068, loss_ce: 0.100490
iteration 17780 : loss : 0.078525, loss_ce: 0.088948
iteration 17790 : loss : 0.080902, loss_ce: 0.108267
iteration 17800 : loss : 0.063201, loss_ce: 0.064739
iteration 17810 : loss : 0.063335, loss_ce: 0.082478
iteration 17820 : loss : 0.073552, loss_ce: 0.092056
iteration 17830 : loss : 0.059933, loss_ce: 0.075662
iteration 17840 : loss : 0.064821, loss_ce: 0.082132
iteration 17850 : loss : 0.069778, loss_ce: 0.085205
iteration 17860 : loss : 0.065265, loss_ce: 0.080640
iteration 17870 : loss : 0.070419, loss_ce: 0.091151
iteration 17880 : loss : 0.068460, loss_ce: 0.077075
iteration 17890 : loss : 0.077533, loss_ce: 0.100988
iteration 17900 : loss : 0.073172, loss_ce: 0.095428
iteration 17910 : loss : 0.057906, loss_ce: 0.067093
iteration 17920 : loss : 0.067010, loss_ce: 0.083763
iteration 17930 : loss : 0.053973, loss_ce: 0.062987
iteration 17940 : loss : 0.081630, loss_ce: 0.099212
iteration 17950 : loss : 0.072038, loss_ce: 0.100085
iteration 17960 : loss : 0.067276, loss_ce: 0.081310
iteration 17970 : loss : 0.086730, loss_ce: 0.111070
iteration 17980 : loss : 0.066937, loss_ce: 0.080196
iteration 17990 : loss : 0.073193, loss_ce: 0.093418
iteration 18000 : loss : 0.077128, loss_ce: 0.093686
iteration 18010 : loss : 0.067216, loss_ce: 0.082210
iteration 18020 : loss : 0.074484, loss_ce: 0.087691
iteration 18030 : loss : 0.066841, loss_ce: 0.085564
iteration 18040 : loss : 0.056926, loss_ce: 0.071185
iteration 18050 : loss : 0.086015, loss_ce: 0.109619
iteration 18060 : loss : 0.071004, loss_ce: 0.087872
iteration 18070 : loss : 0.073192, loss_ce: 0.092503
iteration 18080 : loss : 0.061915, loss_ce: 0.081268
iteration 18090 : loss : 0.075260, loss_ce: 0.094925
iteration 18100 : loss : 0.081010, loss_ce: 0.098593
iteration 18110 : loss : 0.082627, loss_ce: 0.110867
iteration 18120 : loss : 0.067094, loss_ce: 0.082856
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_144_iter_18125.pth
iteration 18130 : loss : 0.057568, loss_ce: 0.070734
iteration 18140 : loss : 0.087372, loss_ce: 0.113266
iteration 18150 : loss : 0.073205, loss_ce: 0.085254
iteration 18160 : loss : 0.063643, loss_ce: 0.083379
iteration 18170 : loss : 0.072780, loss_ce: 0.088020
iteration 18180 : loss : 0.083930, loss_ce: 0.107377
iteration 18190 : loss : 0.081937, loss_ce: 0.108388
iteration 18200 : loss : 0.049321, loss_ce: 0.059557
iteration 18210 : loss : 0.068320, loss_ce: 0.087649
iteration 18220 : loss : 0.073436, loss_ce: 0.096868
iteration 18230 : loss : 0.071056, loss_ce: 0.088637
iteration 18240 : loss : 0.055765, loss_ce: 0.066813
iteration 18250 : loss : 0.062341, loss_ce: 0.083538
iteration 18260 : loss : 0.052646, loss_ce: 0.063516
iteration 18270 : loss : 0.075790, loss_ce: 0.095716
iteration 18280 : loss : 0.077207, loss_ce: 0.103652
iteration 18290 : loss : 0.077014, loss_ce: 0.100745
iteration 18300 : loss : 0.075514, loss_ce: 0.091819
iteration 18310 : loss : 0.055106, loss_ce: 0.066464
iteration 18320 : loss : 0.059974, loss_ce: 0.078614
iteration 18330 : loss : 0.072038, loss_ce: 0.097706
iteration 18340 : loss : 0.065826, loss_ce: 0.077556
iteration 18350 : loss : 0.081280, loss_ce: 0.104630
iteration 18360 : loss : 0.060467, loss_ce: 0.071894
iteration 18370 : loss : 0.067117, loss_ce: 0.084098
iteration 18380 : loss : 0.059932, loss_ce: 0.064242
iteration 18390 : loss : 0.069516, loss_ce: 0.093994
iteration 18400 : loss : 0.070155, loss_ce: 0.095981
iteration 18410 : loss : 0.055897, loss_ce: 0.066527
iteration 18420 : loss : 0.075125, loss_ce: 0.102129
iteration 18430 : loss : 0.070519, loss_ce: 0.083635
iteration 18440 : loss : 0.061107, loss_ce: 0.079673
iteration 18450 : loss : 0.076921, loss_ce: 0.103213
iteration 18460 : loss : 0.053101, loss_ce: 0.062945
iteration 18470 : loss : 0.062162, loss_ce: 0.079891
iteration 18480 : loss : 0.072605, loss_ce: 0.090285
iteration 18490 : loss : 0.072464, loss_ce: 0.099212
iteration 18500 : loss : 0.103778, loss_ce: 0.133445
iteration 18510 : loss : 0.072918, loss_ce: 0.092632
iteration 18520 : loss : 0.067476, loss_ce: 0.080748
iteration 18530 : loss : 0.062092, loss_ce: 0.080686
iteration 18540 : loss : 0.060276, loss_ce: 0.074624
iteration 18550 : loss : 0.075890, loss_ce: 0.100996
iteration 18560 : loss : 0.084472, loss_ce: 0.113229
iteration 18570 : loss : 0.066343, loss_ce: 0.077929
iteration 18580 : loss : 0.075254, loss_ce: 0.092104
iteration 18590 : loss : 0.060444, loss_ce: 0.076675
iteration 18600 : loss : 0.075709, loss_ce: 0.099463
iteration 18610 : loss : 0.079288, loss_ce: 0.098491
iteration 18620 : loss : 0.070810, loss_ce: 0.090997
iteration 18630 : loss : 0.082591, loss_ce: 0.100572
iteration 18640 : loss : 0.072405, loss_ce: 0.084022
iteration 18650 : loss : 0.062393, loss_ce: 0.079416
iteration 18660 : loss : 0.070880, loss_ce: 0.081503
iteration 18670 : loss : 0.084575, loss_ce: 0.106945
iteration 18680 : loss : 0.070901, loss_ce: 0.086827
iteration 18690 : loss : 0.085327, loss_ce: 0.112948
iteration 18700 : loss : 0.065200, loss_ce: 0.083298
iteration 18710 : loss : 0.069460, loss_ce: 0.079064
iteration 18720 : loss : 0.070064, loss_ce: 0.087386
iteration 18730 : loss : 0.070062, loss_ce: 0.083945
iteration 18740 : loss : 0.074662, loss_ce: 0.098803
iteration 18750 : loss : 0.096832, loss_ce: 0.119541
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_149.pth
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_149_iter_18750.pth
iteration 18760 : loss : 0.068129, loss_ce: 0.085514
iteration 18770 : loss : 0.050984, loss_ce: 0.061910
iteration 18780 : loss : 0.071037, loss_ce: 0.084785
iteration 18790 : loss : 0.065936, loss_ce: 0.083858
iteration 18800 : loss : 0.071484, loss_ce: 0.092055
iteration 18810 : loss : 0.059845, loss_ce: 0.070689
iteration 18820 : loss : 0.064815, loss_ce: 0.080299
iteration 18830 : loss : 0.066495, loss_ce: 0.078880
iteration 18840 : loss : 0.063126, loss_ce: 0.074399
iteration 18850 : loss : 0.061439, loss_ce: 0.074276
iteration 18860 : loss : 0.054689, loss_ce: 0.067283
iteration 18870 : loss : 0.077571, loss_ce: 0.098507
iteration 18880 : loss : 0.080436, loss_ce: 0.107022
iteration 18890 : loss : 0.053689, loss_ce: 0.065210
iteration 18900 : loss : 0.056479, loss_ce: 0.069042
iteration 18910 : loss : 0.067160, loss_ce: 0.081718
iteration 18920 : loss : 0.070377, loss_ce: 0.093227
iteration 18930 : loss : 0.073815, loss_ce: 0.095234
iteration 18940 : loss : 0.058963, loss_ce: 0.072972
iteration 18950 : loss : 0.063904, loss_ce: 0.077143
iteration 18960 : loss : 0.073372, loss_ce: 0.091482
iteration 18970 : loss : 0.072712, loss_ce: 0.092323
iteration 18980 : loss : 0.080125, loss_ce: 0.108567
iteration 18990 : loss : 0.056239, loss_ce: 0.070316
iteration 19000 : loss : 0.083495, loss_ce: 0.098213
iteration 19010 : loss : 0.077150, loss_ce: 0.092991
iteration 19020 : loss : 0.060216, loss_ce: 0.072928
iteration 19030 : loss : 0.062239, loss_ce: 0.077141
iteration 19040 : loss : 0.069006, loss_ce: 0.084731
iteration 19050 : loss : 0.068555, loss_ce: 0.082852
iteration 19060 : loss : 0.057750, loss_ce: 0.075881
iteration 19070 : loss : 0.063434, loss_ce: 0.077367
iteration 19080 : loss : 0.076890, loss_ce: 0.090438
iteration 19090 : loss : 0.056498, loss_ce: 0.069664
iteration 19100 : loss : 0.076456, loss_ce: 0.095655
iteration 19110 : loss : 0.079047, loss_ce: 0.100211
iteration 19120 : loss : 0.074329, loss_ce: 0.090205
iteration 19130 : loss : 0.072447, loss_ce: 0.085971
iteration 19140 : loss : 0.087965, loss_ce: 0.110437
iteration 19150 : loss : 0.082535, loss_ce: 0.110188
iteration 19160 : loss : 0.065618, loss_ce: 0.083311
iteration 19170 : loss : 0.085498, loss_ce: 0.118810
iteration 19180 : loss : 0.078001, loss_ce: 0.096138
iteration 19190 : loss : 0.072945, loss_ce: 0.091835
iteration 19200 : loss : 0.059610, loss_ce: 0.069917
iteration 19210 : loss : 0.054849, loss_ce: 0.062528
iteration 19220 : loss : 0.075271, loss_ce: 0.100347
iteration 19230 : loss : 0.065572, loss_ce: 0.081263
iteration 19240 : loss : 0.071268, loss_ce: 0.089563
iteration 19250 : loss : 0.066762, loss_ce: 0.085614
iteration 19260 : loss : 0.064192, loss_ce: 0.078806
iteration 19270 : loss : 0.071259, loss_ce: 0.095355
iteration 19280 : loss : 0.064458, loss_ce: 0.085886
iteration 19290 : loss : 0.061581, loss_ce: 0.075555
iteration 19300 : loss : 0.068306, loss_ce: 0.080239
iteration 19310 : loss : 0.064309, loss_ce: 0.078418
iteration 19320 : loss : 0.069291, loss_ce: 0.090994
iteration 19330 : loss : 0.064002, loss_ce: 0.076905
iteration 19340 : loss : 0.073445, loss_ce: 0.096074
iteration 19350 : loss : 0.051352, loss_ce: 0.058913
iteration 19360 : loss : 0.058116, loss_ce: 0.067710
iteration 19370 : loss : 0.068917, loss_ce: 0.083837
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_154_iter_19375.pth
iteration 19380 : loss : 0.062163, loss_ce: 0.071150
iteration 19390 : loss : 0.069010, loss_ce: 0.087209
iteration 19400 : loss : 0.056859, loss_ce: 0.070597
iteration 19410 : loss : 0.070341, loss_ce: 0.085768
iteration 19420 : loss : 0.079729, loss_ce: 0.105277
iteration 19430 : loss : 0.086181, loss_ce: 0.116018
iteration 19440 : loss : 0.060559, loss_ce: 0.079838
iteration 19450 : loss : 0.074574, loss_ce: 0.096890
iteration 19460 : loss : 0.061727, loss_ce: 0.070469
iteration 19470 : loss : 0.070108, loss_ce: 0.083801
iteration 19480 : loss : 0.058799, loss_ce: 0.078305
iteration 19490 : loss : 0.065594, loss_ce: 0.084418
iteration 19500 : loss : 0.062577, loss_ce: 0.088711
iteration 19510 : loss : 0.061900, loss_ce: 0.080546
iteration 19520 : loss : 0.063638, loss_ce: 0.082565
iteration 19530 : loss : 0.080073, loss_ce: 0.094374
iteration 19540 : loss : 0.055209, loss_ce: 0.062236
iteration 19550 : loss : 0.063245, loss_ce: 0.077610
iteration 19560 : loss : 0.083040, loss_ce: 0.104924
iteration 19570 : loss : 0.061848, loss_ce: 0.074787
iteration 19580 : loss : 0.060742, loss_ce: 0.071693
iteration 19590 : loss : 0.073067, loss_ce: 0.087430
iteration 19600 : loss : 0.077654, loss_ce: 0.096969
iteration 19610 : loss : 0.080137, loss_ce: 0.108063
iteration 19620 : loss : 0.072147, loss_ce: 0.091257
iteration 19630 : loss : 0.068015, loss_ce: 0.088137
iteration 19640 : loss : 0.076157, loss_ce: 0.090851
iteration 19650 : loss : 0.079044, loss_ce: 0.104489
iteration 19660 : loss : 0.065704, loss_ce: 0.083660
iteration 19670 : loss : 0.076135, loss_ce: 0.098272
iteration 19680 : loss : 0.067180, loss_ce: 0.088338
iteration 19690 : loss : 0.055994, loss_ce: 0.072475
iteration 19700 : loss : 0.066210, loss_ce: 0.083109
iteration 19710 : loss : 0.057633, loss_ce: 0.073981
iteration 19720 : loss : 0.067612, loss_ce: 0.079061
iteration 19730 : loss : 0.063612, loss_ce: 0.079178
iteration 19740 : loss : 0.064773, loss_ce: 0.081985
iteration 19750 : loss : 0.062970, loss_ce: 0.070031
iteration 19760 : loss : 0.053691, loss_ce: 0.059900
iteration 19770 : loss : 0.065418, loss_ce: 0.081395
iteration 19780 : loss : 0.068966, loss_ce: 0.086304
iteration 19790 : loss : 0.057620, loss_ce: 0.065079
iteration 19800 : loss : 0.064653, loss_ce: 0.082706
iteration 19810 : loss : 0.072595, loss_ce: 0.093447
iteration 19820 : loss : 0.070537, loss_ce: 0.091351
iteration 19830 : loss : 0.064536, loss_ce: 0.079054
iteration 19840 : loss : 0.073650, loss_ce: 0.094697
iteration 19850 : loss : 0.065046, loss_ce: 0.083095
iteration 19860 : loss : 0.074461, loss_ce: 0.100902
iteration 19870 : loss : 0.061646, loss_ce: 0.078005
iteration 19880 : loss : 0.076919, loss_ce: 0.100615
iteration 19890 : loss : 0.077781, loss_ce: 0.096074
iteration 19900 : loss : 0.089498, loss_ce: 0.116847
iteration 19910 : loss : 0.070751, loss_ce: 0.088603
iteration 19920 : loss : 0.066670, loss_ce: 0.082883
iteration 19930 : loss : 0.061800, loss_ce: 0.074670
iteration 19940 : loss : 0.067870, loss_ce: 0.081626
iteration 19950 : loss : 0.062888, loss_ce: 0.079357
iteration 19960 : loss : 0.063010, loss_ce: 0.069430
iteration 19970 : loss : 0.060573, loss_ce: 0.074275
iteration 19980 : loss : 0.072484, loss_ce: 0.099134
iteration 19990 : loss : 0.079856, loss_ce: 0.103286
iteration 20000 : loss : 0.056107, loss_ce: 0.073853
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_159_iter_20000.pth
iteration 20010 : loss : 0.065374, loss_ce: 0.075788
iteration 20020 : loss : 0.063074, loss_ce: 0.076889
iteration 20030 : loss : 0.092967, loss_ce: 0.121855
iteration 20040 : loss : 0.077570, loss_ce: 0.087814
iteration 20050 : loss : 0.085233, loss_ce: 0.111858
iteration 20060 : loss : 0.071642, loss_ce: 0.090151
iteration 20070 : loss : 0.093052, loss_ce: 0.112554
iteration 20080 : loss : 0.065149, loss_ce: 0.079682
iteration 20090 : loss : 0.056097, loss_ce: 0.071072
iteration 20100 : loss : 0.071787, loss_ce: 0.095017
iteration 20110 : loss : 0.048791, loss_ce: 0.056274
iteration 20120 : loss : 0.084815, loss_ce: 0.105779
iteration 20130 : loss : 0.064240, loss_ce: 0.082266
iteration 20140 : loss : 0.076964, loss_ce: 0.098887
iteration 20150 : loss : 0.067684, loss_ce: 0.085239
iteration 20160 : loss : 0.070300, loss_ce: 0.089954
iteration 20170 : loss : 0.070204, loss_ce: 0.085053
iteration 20180 : loss : 0.074266, loss_ce: 0.090330
iteration 20190 : loss : 0.078669, loss_ce: 0.102494
iteration 20200 : loss : 0.087342, loss_ce: 0.118634
iteration 20210 : loss : 0.077554, loss_ce: 0.102796
iteration 20220 : loss : 0.076877, loss_ce: 0.096903
iteration 20230 : loss : 0.056277, loss_ce: 0.067311
iteration 20240 : loss : 0.056666, loss_ce: 0.067782
iteration 20250 : loss : 0.063112, loss_ce: 0.080190
iteration 20260 : loss : 0.080788, loss_ce: 0.107148
iteration 20270 : loss : 0.060504, loss_ce: 0.073667
iteration 20280 : loss : 0.066996, loss_ce: 0.086011
iteration 20290 : loss : 0.066052, loss_ce: 0.086925
iteration 20300 : loss : 0.067393, loss_ce: 0.091689
iteration 20310 : loss : 0.075342, loss_ce: 0.098131
iteration 20320 : loss : 0.067553, loss_ce: 0.075974
iteration 20330 : loss : 0.082162, loss_ce: 0.105519
iteration 20340 : loss : 0.066941, loss_ce: 0.085058
iteration 20350 : loss : 0.056671, loss_ce: 0.069271
iteration 20360 : loss : 0.072536, loss_ce: 0.087931
iteration 20366 : loss : 0.058708, loss_ce: 0.066837
iteration 20367 : loss : 0.076123, loss_ce: 0.098745
iteration 20368 : loss : 0.063106, loss_ce: 0.080087
iteration 20369 : loss : 0.086341, loss_ce: 0.111325
iteration 20370 : loss : 0.068307, loss_ce: 0.083657
iteration 20370 : loss : 0.068307, loss_ce: 0.083657
iteration 20371 : loss : 0.076436, loss_ce: 0.091486
iteration 20372 : loss : 0.109337, loss_ce: 0.151054
iteration 20373 : loss : 0.074243, loss_ce: 0.098360
iteration 20374 : loss : 0.069033, loss_ce: 0.083821
iteration 20375 : loss : 0.068603, loss_ce: 0.092129
save model to model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_162.pth
------Training Stats------
Training finished in 15089.67 seconds (251.49 minutes).
Average time per iteration: 0.74s/it
Average loss: 0.0768

Testing¶

InĀ [45]:
def calculate_metric_percase(pred, gt): # Bug in Orignal Code Fixed
    pred[pred > 0] = 1
    gt[gt > 0] = 1
    
    max_distance = np.sqrt(224**2 + 224**2)
    
    if pred.sum() > 0 and gt.sum()>0:
        dice = metric.binary.dc(pred, gt)
        hd95 = metric.binary.hd95(pred, gt)
        return dice, hd95
    elif pred.sum() > 0 and gt.sum() == 0:
        # False positives: prediction has buildings but ground truth doesn't
        return 0, max_distance  # or return 0, 373.13 (max possible HD95 for 224x224)
    elif pred.sum() == 0 and gt.sum() > 0:
        # False negatives: ground truth has buildings but prediction doesn't
        return 0, max_distance  # or return 0, 373.13
    else:
        # Both prediction and ground truth are empty - perfect agreement
        return 1, 0

def test_single_volume(image_tensor, label_tensor, net, classes, patch_size=[224, 224], test_save_path=None, case=None):
    # Gemeni Pro 2.5 Rewote most of this function inorder to debug an issue with the original code & Fix Demensiion Problems
    
    # image_tensor is image_batch from DataLoader, shape (1, 3, 224, 224), PyTorch tensor on CPU
    # label_tensor is label_batch from DataLoader, shape (1, 1, 224, 224), PyTorch tensor on CPU

    # Determine device from model:
    device = next(net.parameters()).device

    # Prepare input for the network: move to device, ensure correct dtype
    input_for_net = image_tensor.to(device).float()

    # Prepare label for metrics: convert to NumPy, squeeze batch and channel dimensions.
    label_np_for_metrics = label_tensor.squeeze(0).squeeze(0).cpu().detach().numpy()

    net.eval() # Set model to evaluation mode

    with torch.no_grad(): # Disable gradient calculations
        # Forward pass: input_for_net is 4D (1, 3, 224, 224)
        outputs = net(input_for_net)  # Expected output shape: (1, num_classes, H, W)

        # Get prediction: apply softmax, then argmax. Output shape (1, H, W) then (H,W)
        prediction_np = torch.argmax(torch.softmax(outputs, dim=1), dim=1).squeeze(0).cpu().detach().numpy()

    metric_list = []
    # Ensure class 0 (background) is not included if your classes are 0 and 1 and metrics are for foreground (class 1)
    for i in range(1, classes): # Iterates only for foreground class if classes=2 (0=bg, 1=fg)
        # Both prediction_np and label_np_for_metrics are (H, W)
        metric_list.append(calculate_metric_percase(prediction_np == i, label_np_for_metrics == i))

    if test_save_path is not None and case is not None:
        # image_tensor is (1, 3, 224, 224). Squeeze batch dim -> (3, 224, 224)
        image_np_for_saving = image_tensor.squeeze(0).cpu().detach().numpy()

        # Ensure SimpleITK gets NumPy arrays with correct types
        img_itk = sitk.GetImageFromArray(image_np_for_saving.astype(np.float32))
        prd_itk = sitk.GetImageFromArray(prediction_np.astype(np.float32))
        lab_itk = sitk.GetImageFromArray(label_np_for_metrics.astype(np.float32))

        # The test_save_path directory is created before calling inference.
        sitk.WriteImage(prd_itk, os.path.join(test_save_path, str(case) + "_pred.nii.gz"))
        sitk.WriteImage(img_itk, os.path.join(test_save_path, str(case) + "_img.nii.gz"))
        sitk.WriteImage(lab_itk, os.path.join(test_save_path, str(case) + "_gt.nii.gz"))

    return metric_list    
InĀ [46]:
import argparse
import logging
import os
import random
import sys
import numpy as np
import torch
import torch.backends.cudnn as cudnn
import torch.nn as nn
from torch.utils.data import DataLoader
from tqdm.notebook import tqdm
from medpy import metric
from scipy.ndimage import zoom
import SimpleITK as sitk
InĀ [47]:
parser = argparse.ArgumentParser()

parser.add_argument('--dataset', type=str, default='GF7')
parser.add_argument('--num_classes', type=int, default=2)
parser.add_argument('--max_iterations', type=int, default=30000)
parser.add_argument('--max_epochs', type=int, default=8)
parser.add_argument('--batch_size', type=int, default=4)
parser.add_argument('--n_gpu', type=int, default=1)
parser.add_argument('--is_savenii', action="store_true", help='whether to save results during inference', default=True)
parser.add_argument('--deterministic', type=int, default=1) # Make it 1 for reproducibility
parser.add_argument('--base_lr', type=float, default=0.01)
parser.add_argument('--img_size', type=int, default=224)
parser.add_argument('--seed', type=int, default=42)
parser.add_argument('--n_skip', type=int, default=3)
parser.add_argument('--vit_name', type=str, default='R50-ViT-B_16')
parser.add_argument('--vit_patches_size', type=int, default=16)
parser.add_argument('--test_save_dir', type=str, default='predictions', help='saving prediction as nii!')

# Add these two for GF7Dataset
parser.add_argument('--image_dir', type=str, help='Path to satellite images')
parser.add_argument('--mask_dir', type=str, help='Path to segmentation masks')

if epc is None:
    epc = '50'  # Default value for max_epochs

# epc = "XX"  # Set this to the desired number of epochs to select specifc model

# Parse args manually for notebook
args = parser.parse_args(args=[
    '--dataset', 'GF7',
    '--num_classes', '2',
    '--max_epochs', epc, # Should be 150
    '--batch_size', '25',
    '--n_gpu', '1',
    '--base_lr', '0.001',
    '--img_size', '224',
    '--seed', '42',
    '--n_skip', '3',
    '--vit_name', 'R50-ViT-B_16',
    '--vit_patches_size', '16',
    '--test_save_dir', 'predictions',
    '--is_savenii',  # Only needs Just the flag
    '--image_dir', 'data/GF-7 Building (3Bands)/Test/image', # Change this Back    
    '--mask_dir', 'data/GF-7 Building (3Bands)/Test/label' # Change This Back
])

print(args)
Namespace(dataset='GF7', num_classes=2, max_iterations=30000, max_epochs=163, batch_size=25, n_gpu=1, is_savenii=True, deterministic=1, base_lr=0.001, img_size=224, seed=42, n_skip=3, vit_name='R50-ViT-B_16', vit_patches_size=16, test_save_dir='predictions', image_dir='data/GF-7 Building (3Bands)/Test/image', mask_dir='data/GF-7 Building (3Bands)/Test/label')
InĀ [48]:
def inference(args, model, test_save_path=None):
    print("\n\nStarting Inference...")
    print("Test Save Path:", test_save_path)
    

    # Use GF7Dataset
    db_test = GF7Dataset(
        image_dir=args.image_dir,
        mask_dir=args.mask_dir,
        image_size=args.img_size,
        transform=None  # No transform for inference
    )
    
    testloader = DataLoader(db_test, batch_size=1, shuffle=False, num_workers=0)
    print("The length of test set is: {}".format(len(db_test)))
    
    logging.info("{} test iterations per epoch".format(len(testloader)))
    
    model.eval() # Sets PyTorch Model to evaluation mode
    metric_list = 0.0
    metric_list_full = []
        
    with tqdm(total=len(testloader), desc="Testing", ncols=500, leave=True) as pbar:

        for i_batch, (image_batch, label_batch) in  enumerate(testloader):
            h, w = image_batch.size()[2:]
            metric_i = test_single_volume(image_batch, label_batch, model, classes=args.num_classes, patch_size=[args.img_size, args.img_size],
                                        test_save_path=test_save_path, case=str(i_batch))
            metric_list += np.array(metric_i)
            image_filename = os.path.basename(db_test.image_paths[i_batch])
            metric_list_full.append({'filename': image_filename, 'metrics': metric_i})
            
            # Log Every 15 iterations
            if i_batch % 15 == 0:
                logging.info('idx %d case %s mean_dice %f mean_hd95 %f' % (i_batch, str(i_batch), np.mean(metric_i, axis=0)[0], np.mean(metric_i, axis=0)[1]))
                
            pbar.update(1)
        metric_list = metric_list / len(db_test)    
    
    for i in range(1, args.num_classes):
        logging.info('Mean class %d mean_dice %f mean_hd95 %f' % (i, metric_list[i-1][0], metric_list[i-1][1]))
    
    performance = np.mean(metric_list, axis=0)[0]
    mean_hd95 = np.mean(metric_list, axis=0)[1]
    logging.info('Testing performance in best model: mean_dice : %f mean_hd95 : %f' % (performance, mean_hd95))  
    
    print('\n\n Testing Finished!')  

    return metric_list_full
InĀ [49]:
if not args.deterministic:
    cudnn.benchmark = True
    cudnn.deterministic = False
else:
    cudnn.benchmark = False
    cudnn.deterministic = True

random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)

# -----------------------
# Dataset Configuration
# -----------------------
dataset_name = 'GF7'
dataset_config = {
    'GF7': {
        'image_dir': args.image_dir,
        'mask_dir': args.mask_dir,
        'num_classes': 2
    }
}

if args.batch_size != 24 and args.batch_size % 6 == 0:
    args.base_lr *= args.batch_size / 24

args.dataset = dataset_name
args.num_classes = dataset_config[dataset_name]['num_classes']
args.image_dir = dataset_config[dataset_name]['image_dir']
args.mask_dir = dataset_config[dataset_name]['mask_dir']
args.is_pretrain = True 

args.exp = 'TU_' + dataset_name + str(args.img_size)
snapshot_path = "model/{}/{}".format(args.exp, 'TU')
snapshot_path = snapshot_path + '_pretrain' if args.is_pretrain else snapshot_path
snapshot_path += f"_{args.vit_name}_skip{args.n_skip}"
snapshot_path = snapshot_path + '_vitpatch' + str(args.vit_patches_size) if args.vit_patches_size!=16 else snapshot_path
snapshot_path = snapshot_path + '_epo' + str(args.max_epochs) if args.max_epochs != 30 else snapshot_path
snapshot_path = snapshot_path+'_bs'+str(args.batch_size)
snapshot_path = snapshot_path + '_lr' + str(args.base_lr) if args.base_lr != 0.01 else snapshot_path
snapshot_path = snapshot_path + '_'+str(args.img_size)
snapshot_path = snapshot_path + '_s'+str(args.seed) if args.seed!=1234 else snapshot_path

# Create snapshot directory
if not os.path.exists(snapshot_path):
    os.makedirs(snapshot_path)

# -----------------------
# ViT Config and Model
# -----------------------
config_vit = CONFIGS[args.vit_name]
config_vit.n_classes = args.num_classes
config_vit.n_skip = args.n_skip
# This is an Addition That needs to be Checked and Look at the Train As well
config_vit.patches.size = (args.vit_patches_size, args.vit_patches_size)

if 'R50' in args.vit_name:
    grid_size = int(args.img_size / args.vit_patches_size)
    config_vit.patches.grid = (grid_size, grid_size)

# Build model
net = VisionTransformer(config_vit, img_size=args.img_size, num_classes=config_vit.n_classes).to(device)

snapshot = os.path.join(snapshot_path, 'best_model.pth')
if not os.path.exists(snapshot): snapshot = snapshot.replace('best_model', 'epoch_'+str(args.max_epochs-1))
net.load_state_dict(torch.load(snapshot.replace('\\', '/')))
snapshot_name = snapshot_path.split('/')[-1]

log_folder = './test_log/test_log_' + args.exp
os.makedirs(log_folder, exist_ok=True)
logging.basicConfig(filename=log_folder + '/'+snapshot_name+".txt", level=logging.INFO, format='[%(asctime)s.%(msecs)03d] %(message)s', datefmt='%H:%M:%S')
logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))
logging.info(str(args))
logging.info(snapshot_name)

if args.is_savenii:
    args.test_save_dir = 'predictions'
    test_save_path = os.path.join(args.test_save_dir, args.exp, snapshot_name)
    os.makedirs(test_save_path, exist_ok=True)
else:
    test_save_path = None


test_results = inference(args, net, test_save_path)
Namespace(dataset='GF7', num_classes=2, max_iterations=30000, max_epochs=163, batch_size=25, n_gpu=1, is_savenii=True, deterministic=1, base_lr=0.001, img_size=224, seed=42, n_skip=3, vit_name='R50-ViT-B_16', vit_patches_size=16, test_save_dir='predictions', image_dir='data/GF-7 Building (3Bands)/Test/image', mask_dir='data/GF-7 Building (3Bands)/Test/label', is_pretrain=True, exp='TU_GF7224')
Namespace(dataset='GF7', num_classes=2, max_iterations=30000, max_epochs=163, batch_size=25, n_gpu=1, is_savenii=True, deterministic=1, base_lr=0.001, img_size=224, seed=42, n_skip=3, vit_name='R50-ViT-B_16', vit_patches_size=16, test_save_dir='predictions', image_dir='data/GF-7 Building (3Bands)/Test/image', mask_dir='data/GF-7 Building (3Bands)/Test/label', is_pretrain=True, exp='TU_GF7224')
Namespace(dataset='GF7', num_classes=2, max_iterations=30000, max_epochs=163, batch_size=25, n_gpu=1, is_savenii=True, deterministic=1, base_lr=0.001, img_size=224, seed=42, n_skip=3, vit_name='R50-ViT-B_16', vit_patches_size=16, test_save_dir='predictions', image_dir='data/GF-7 Building (3Bands)/Test/image', mask_dir='data/GF-7 Building (3Bands)/Test/label', is_pretrain=True, exp='TU_GF7224')
TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42
TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42
TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42


Starting Inference...
Test Save Path: predictions\TU_GF7224\TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42
The length of test set is: 1035
1035 test iterations per epoch
1035 test iterations per epoch
1035 test iterations per epoch
Testing:   0%|                                                                                                …
idx 0 case 0 mean_dice 0.929350 mean_hd95 8.944272
idx 0 case 0 mean_dice 0.929350 mean_hd95 8.944272
idx 0 case 0 mean_dice 0.929350 mean_hd95 8.944272
idx 0 case 0 mean_dice 0.929350 mean_hd95 8.944272
idx 0 case 0 mean_dice 0.929350 mean_hd95 8.944272
idx 15 case 15 mean_dice 0.914220 mean_hd95 6.324555
idx 15 case 15 mean_dice 0.914220 mean_hd95 6.324555
idx 15 case 15 mean_dice 0.914220 mean_hd95 6.324555
idx 30 case 30 mean_dice 0.800000 mean_hd95 9.337485
idx 30 case 30 mean_dice 0.800000 mean_hd95 9.337485
idx 30 case 30 mean_dice 0.800000 mean_hd95 9.337485
idx 45 case 45 mean_dice 0.842473 mean_hd95 68.000000
idx 45 case 45 mean_dice 0.842473 mean_hd95 68.000000
idx 45 case 45 mean_dice 0.842473 mean_hd95 68.000000
idx 60 case 60 mean_dice 0.784546 mean_hd95 8.062258
idx 60 case 60 mean_dice 0.784546 mean_hd95 8.062258
idx 60 case 60 mean_dice 0.784546 mean_hd95 8.062258
idx 75 case 75 mean_dice 0.621569 mean_hd95 37.913413
idx 75 case 75 mean_dice 0.621569 mean_hd95 37.913413
idx 75 case 75 mean_dice 0.621569 mean_hd95 37.913413
idx 90 case 90 mean_dice 1.000000 mean_hd95 0.000000
idx 90 case 90 mean_dice 1.000000 mean_hd95 0.000000
idx 90 case 90 mean_dice 1.000000 mean_hd95 0.000000
idx 105 case 105 mean_dice 0.810729 mean_hd95 5.099020
idx 105 case 105 mean_dice 0.810729 mean_hd95 5.099020
idx 105 case 105 mean_dice 0.810729 mean_hd95 5.099020
idx 120 case 120 mean_dice 0.782405 mean_hd95 8.062258
idx 120 case 120 mean_dice 0.782405 mean_hd95 8.062258
idx 120 case 120 mean_dice 0.782405 mean_hd95 8.062258
idx 135 case 135 mean_dice 0.865285 mean_hd95 4.242641
idx 135 case 135 mean_dice 0.865285 mean_hd95 4.242641
idx 135 case 135 mean_dice 0.865285 mean_hd95 4.242641
idx 150 case 150 mean_dice 0.802120 mean_hd95 7.155089
idx 150 case 150 mean_dice 0.802120 mean_hd95 7.155089
idx 150 case 150 mean_dice 0.802120 mean_hd95 7.155089
idx 165 case 165 mean_dice 0.678227 mean_hd95 7.071068
idx 165 case 165 mean_dice 0.678227 mean_hd95 7.071068
idx 165 case 165 mean_dice 0.678227 mean_hd95 7.071068
idx 180 case 180 mean_dice 0.855857 mean_hd95 6.000000
idx 180 case 180 mean_dice 0.855857 mean_hd95 6.000000
idx 180 case 180 mean_dice 0.855857 mean_hd95 6.000000
idx 195 case 195 mean_dice 0.752144 mean_hd95 36.228031
idx 195 case 195 mean_dice 0.752144 mean_hd95 36.228031
idx 195 case 195 mean_dice 0.752144 mean_hd95 36.228031
idx 210 case 210 mean_dice 0.829974 mean_hd95 5.000000
idx 210 case 210 mean_dice 0.829974 mean_hd95 5.000000
idx 210 case 210 mean_dice 0.829974 mean_hd95 5.000000
idx 225 case 225 mean_dice 1.000000 mean_hd95 0.000000
idx 225 case 225 mean_dice 1.000000 mean_hd95 0.000000
idx 225 case 225 mean_dice 1.000000 mean_hd95 0.000000
idx 240 case 240 mean_dice 0.909986 mean_hd95 16.000000
idx 240 case 240 mean_dice 0.909986 mean_hd95 16.000000
idx 240 case 240 mean_dice 0.909986 mean_hd95 16.000000
idx 255 case 255 mean_dice 0.859009 mean_hd95 10.440307
idx 255 case 255 mean_dice 0.859009 mean_hd95 10.440307
idx 255 case 255 mean_dice 0.859009 mean_hd95 10.440307
idx 270 case 270 mean_dice 0.806375 mean_hd95 2.236068
idx 270 case 270 mean_dice 0.806375 mean_hd95 2.236068
idx 270 case 270 mean_dice 0.806375 mean_hd95 2.236068
idx 285 case 285 mean_dice 0.805513 mean_hd95 18.343830
idx 285 case 285 mean_dice 0.805513 mean_hd95 18.343830
idx 285 case 285 mean_dice 0.805513 mean_hd95 18.343830
idx 300 case 300 mean_dice 0.926138 mean_hd95 8.892980
idx 300 case 300 mean_dice 0.926138 mean_hd95 8.892980
idx 300 case 300 mean_dice 0.926138 mean_hd95 8.892980
idx 315 case 315 mean_dice 0.783092 mean_hd95 3.605551
idx 315 case 315 mean_dice 0.783092 mean_hd95 3.605551
idx 315 case 315 mean_dice 0.783092 mean_hd95 3.605551
idx 330 case 330 mean_dice 0.747378 mean_hd95 4.000000
idx 330 case 330 mean_dice 0.747378 mean_hd95 4.000000
idx 330 case 330 mean_dice 0.747378 mean_hd95 4.000000
idx 345 case 345 mean_dice 0.928267 mean_hd95 7.615773
idx 345 case 345 mean_dice 0.928267 mean_hd95 7.615773
idx 345 case 345 mean_dice 0.928267 mean_hd95 7.615773
idx 360 case 360 mean_dice 0.978875 mean_hd95 24.634171
idx 360 case 360 mean_dice 0.978875 mean_hd95 24.634171
idx 360 case 360 mean_dice 0.978875 mean_hd95 24.634171
idx 375 case 375 mean_dice 0.559461 mean_hd95 24.000000
idx 375 case 375 mean_dice 0.559461 mean_hd95 24.000000
idx 375 case 375 mean_dice 0.559461 mean_hd95 24.000000
idx 390 case 390 mean_dice 0.867404 mean_hd95 3.605551
idx 390 case 390 mean_dice 0.867404 mean_hd95 3.605551
idx 390 case 390 mean_dice 0.867404 mean_hd95 3.605551
idx 405 case 405 mean_dice 0.820702 mean_hd95 11.133097
idx 405 case 405 mean_dice 0.820702 mean_hd95 11.133097
idx 405 case 405 mean_dice 0.820702 mean_hd95 11.133097
idx 420 case 420 mean_dice 0.000000 mean_hd95 316.783838
idx 420 case 420 mean_dice 0.000000 mean_hd95 316.783838
idx 420 case 420 mean_dice 0.000000 mean_hd95 316.783838
idx 435 case 435 mean_dice 1.000000 mean_hd95 0.000000
idx 435 case 435 mean_dice 1.000000 mean_hd95 0.000000
idx 435 case 435 mean_dice 1.000000 mean_hd95 0.000000
idx 450 case 450 mean_dice 1.000000 mean_hd95 0.000000
idx 450 case 450 mean_dice 1.000000 mean_hd95 0.000000
idx 450 case 450 mean_dice 1.000000 mean_hd95 0.000000
idx 465 case 465 mean_dice 0.863992 mean_hd95 5.385165
idx 465 case 465 mean_dice 0.863992 mean_hd95 5.385165
idx 465 case 465 mean_dice 0.863992 mean_hd95 5.385165
idx 480 case 480 mean_dice 0.890754 mean_hd95 9.211336
idx 480 case 480 mean_dice 0.890754 mean_hd95 9.211336
idx 480 case 480 mean_dice 0.890754 mean_hd95 9.211336
idx 495 case 495 mean_dice 0.778914 mean_hd95 12.136657
idx 495 case 495 mean_dice 0.778914 mean_hd95 12.136657
idx 495 case 495 mean_dice 0.778914 mean_hd95 12.136657
idx 510 case 510 mean_dice 0.826446 mean_hd95 1.269239
idx 510 case 510 mean_dice 0.826446 mean_hd95 1.269239
idx 510 case 510 mean_dice 0.826446 mean_hd95 1.269239
idx 525 case 525 mean_dice 0.854645 mean_hd95 5.099020
idx 525 case 525 mean_dice 0.854645 mean_hd95 5.099020
idx 525 case 525 mean_dice 0.854645 mean_hd95 5.099020
idx 540 case 540 mean_dice 1.000000 mean_hd95 0.000000
idx 540 case 540 mean_dice 1.000000 mean_hd95 0.000000
idx 540 case 540 mean_dice 1.000000 mean_hd95 0.000000
idx 555 case 555 mean_dice 0.793263 mean_hd95 5.000000
idx 555 case 555 mean_dice 0.793263 mean_hd95 5.000000
idx 555 case 555 mean_dice 0.793263 mean_hd95 5.000000
idx 570 case 570 mean_dice 0.928225 mean_hd95 7.280110
idx 570 case 570 mean_dice 0.928225 mean_hd95 7.280110
idx 570 case 570 mean_dice 0.928225 mean_hd95 7.280110
idx 585 case 585 mean_dice 0.869702 mean_hd95 3.162278
idx 585 case 585 mean_dice 0.869702 mean_hd95 3.162278
idx 585 case 585 mean_dice 0.869702 mean_hd95 3.162278
idx 600 case 600 mean_dice 0.765902 mean_hd95 38.208501
idx 600 case 600 mean_dice 0.765902 mean_hd95 38.208501
idx 600 case 600 mean_dice 0.765902 mean_hd95 38.208501
idx 615 case 615 mean_dice 0.873792 mean_hd95 3.605551
idx 615 case 615 mean_dice 0.873792 mean_hd95 3.605551
idx 615 case 615 mean_dice 0.873792 mean_hd95 3.605551
idx 630 case 630 mean_dice 0.821119 mean_hd95 9.219544
idx 630 case 630 mean_dice 0.821119 mean_hd95 9.219544
idx 630 case 630 mean_dice 0.821119 mean_hd95 9.219544
idx 645 case 645 mean_dice 0.779711 mean_hd95 5.830952
idx 645 case 645 mean_dice 0.779711 mean_hd95 5.830952
idx 645 case 645 mean_dice 0.779711 mean_hd95 5.830952
idx 660 case 660 mean_dice 0.851018 mean_hd95 4.000000
idx 660 case 660 mean_dice 0.851018 mean_hd95 4.000000
idx 660 case 660 mean_dice 0.851018 mean_hd95 4.000000
idx 675 case 675 mean_dice 0.880381 mean_hd95 2.236068
idx 675 case 675 mean_dice 0.880381 mean_hd95 2.236068
idx 675 case 675 mean_dice 0.880381 mean_hd95 2.236068
idx 690 case 690 mean_dice 0.736128 mean_hd95 14.406547
idx 690 case 690 mean_dice 0.736128 mean_hd95 14.406547
idx 690 case 690 mean_dice 0.736128 mean_hd95 14.406547
idx 705 case 705 mean_dice 0.854046 mean_hd95 15.578802
idx 705 case 705 mean_dice 0.854046 mean_hd95 15.578802
idx 705 case 705 mean_dice 0.854046 mean_hd95 15.578802
idx 720 case 720 mean_dice 0.808097 mean_hd95 17.000000
idx 720 case 720 mean_dice 0.808097 mean_hd95 17.000000
idx 720 case 720 mean_dice 0.808097 mean_hd95 17.000000
idx 735 case 735 mean_dice 0.749478 mean_hd95 16.278821
idx 735 case 735 mean_dice 0.749478 mean_hd95 16.278821
idx 735 case 735 mean_dice 0.749478 mean_hd95 16.278821
idx 750 case 750 mean_dice 0.823380 mean_hd95 5.830952
idx 750 case 750 mean_dice 0.823380 mean_hd95 5.830952
idx 750 case 750 mean_dice 0.823380 mean_hd95 5.830952
idx 765 case 765 mean_dice 0.867295 mean_hd95 3.162278
idx 765 case 765 mean_dice 0.867295 mean_hd95 3.162278
idx 765 case 765 mean_dice 0.867295 mean_hd95 3.162278
idx 780 case 780 mean_dice 0.810423 mean_hd95 7.000000
idx 780 case 780 mean_dice 0.810423 mean_hd95 7.000000
idx 780 case 780 mean_dice 0.810423 mean_hd95 7.000000
idx 795 case 795 mean_dice 0.606787 mean_hd95 15.626816
idx 795 case 795 mean_dice 0.606787 mean_hd95 15.626816
idx 795 case 795 mean_dice 0.606787 mean_hd95 15.626816
idx 810 case 810 mean_dice 0.700588 mean_hd95 21.931712
idx 810 case 810 mean_dice 0.700588 mean_hd95 21.931712
idx 810 case 810 mean_dice 0.700588 mean_hd95 21.931712
idx 825 case 825 mean_dice 0.923583 mean_hd95 10.037407
idx 825 case 825 mean_dice 0.923583 mean_hd95 10.037407
idx 825 case 825 mean_dice 0.923583 mean_hd95 10.037407
idx 840 case 840 mean_dice 0.781846 mean_hd95 4.000000
idx 840 case 840 mean_dice 0.781846 mean_hd95 4.000000
idx 840 case 840 mean_dice 0.781846 mean_hd95 4.000000
idx 855 case 855 mean_dice 0.792012 mean_hd95 10.158431
idx 855 case 855 mean_dice 0.792012 mean_hd95 10.158431
idx 855 case 855 mean_dice 0.792012 mean_hd95 10.158431
idx 870 case 870 mean_dice 0.884778 mean_hd95 5.000000
idx 870 case 870 mean_dice 0.884778 mean_hd95 5.000000
idx 870 case 870 mean_dice 0.884778 mean_hd95 5.000000
idx 885 case 885 mean_dice 0.829584 mean_hd95 13.084221
idx 885 case 885 mean_dice 0.829584 mean_hd95 13.084221
idx 885 case 885 mean_dice 0.829584 mean_hd95 13.084221
idx 900 case 900 mean_dice 0.910737 mean_hd95 4.242641
idx 900 case 900 mean_dice 0.910737 mean_hd95 4.242641
idx 900 case 900 mean_dice 0.910737 mean_hd95 4.242641
idx 915 case 915 mean_dice 0.000000 mean_hd95 316.783838
idx 915 case 915 mean_dice 0.000000 mean_hd95 316.783838
idx 915 case 915 mean_dice 0.000000 mean_hd95 316.783838
idx 930 case 930 mean_dice 0.819588 mean_hd95 5.830952
idx 930 case 930 mean_dice 0.819588 mean_hd95 5.830952
idx 930 case 930 mean_dice 0.819588 mean_hd95 5.830952
idx 945 case 945 mean_dice 1.000000 mean_hd95 0.000000
idx 945 case 945 mean_dice 1.000000 mean_hd95 0.000000
idx 945 case 945 mean_dice 1.000000 mean_hd95 0.000000
idx 960 case 960 mean_dice 1.000000 mean_hd95 0.000000
idx 960 case 960 mean_dice 1.000000 mean_hd95 0.000000
idx 960 case 960 mean_dice 1.000000 mean_hd95 0.000000
idx 975 case 975 mean_dice 0.000000 mean_hd95 316.783838
idx 975 case 975 mean_dice 0.000000 mean_hd95 316.783838
idx 975 case 975 mean_dice 0.000000 mean_hd95 316.783838
idx 990 case 990 mean_dice 0.843447 mean_hd95 7.071068
idx 990 case 990 mean_dice 0.843447 mean_hd95 7.071068
idx 990 case 990 mean_dice 0.843447 mean_hd95 7.071068
idx 1005 case 1005 mean_dice 0.787607 mean_hd95 9.992443
idx 1005 case 1005 mean_dice 0.787607 mean_hd95 9.992443
idx 1005 case 1005 mean_dice 0.787607 mean_hd95 9.992443
idx 1020 case 1020 mean_dice 0.873135 mean_hd95 4.000000
idx 1020 case 1020 mean_dice 0.873135 mean_hd95 4.000000
idx 1020 case 1020 mean_dice 0.873135 mean_hd95 4.000000
Mean class 1 mean_dice 0.751649 mean_hd95 27.417227
Mean class 1 mean_dice 0.751649 mean_hd95 27.417227
Mean class 1 mean_dice 0.751649 mean_hd95 27.417227
Testing performance in best model: mean_dice : 0.751649 mean_hd95 : 27.417227
Testing performance in best model: mean_dice : 0.751649 mean_hd95 : 27.417227
Testing performance in best model: mean_dice : 0.751649 mean_hd95 : 27.417227


 Testing Finished!

Analysing the Model Results¶

InĀ [50]:
test_results

# Histagram of Dice Coefficient
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np


# Convert Test Results to Pandas DataFrame
test_results_df = pd.DataFrame([
    {
        'Filename': d['filename'],
        'Dice Coefficient': d['metrics'][0][0],
        'HD95': d['metrics'][0][1]
    }
    for d in test_results
])

test_results_df['City'] = test_results_df['Filename'].apply(lambda x: x.split('_')[0])
test_results_df['Image'] = test_results_df['Filename'].apply(lambda x: x.split('_')[1])


print("Median Scores:", test_results_df[['Dice Coefficient', 'HD95']].median())
print("Uper Quartile Scores:", test_results_df[['Dice Coefficient', 'HD95']].quantile(0.75))
print("Lower Quartile Scores:", test_results_df[['Dice Coefficient', 'HD95']].quantile(0.25))


# Plot histogram of Dice Coefficient
plt.figure(figsize=(10, 6))
sns.histplot(test_results_df['Dice Coefficient'], bins=20, kde=True)
plt.title('Histogram of Dice Coefficient')
plt.xlabel('Dice Coefficient')
plt.ylabel('Frequency')
plt.grid()
plt.show()

# Plot histogram of HD95
plt.figure(figsize=(10, 6))
sns.histplot(test_results_df['HD95'], bins=20, kde=True, color='orange')
median_hd95 = np.median(test_results_df['HD95'])
plt.title('Histogram of HD95')
plt.xlabel('HD95')
plt.ylabel('Frequency')
plt.grid()
plt.show()
Median Scores: Dice Coefficient    0.815338
HD95                8.000000
dtype: float64
Uper Quartile Scores: Dice Coefficient     0.869990
HD95                16.855625
Name: 0.75, dtype: float64
Lower Quartile Scores: Dice Coefficient    0.728367
HD95                5.000000
Name: 0.25, dtype: float64
No description has been provided for this image
No description has been provided for this image

Scores By City¶

InĀ [51]:
# See Cties
test_results_df['City'].unique()
Out[51]:
array(['Chongqing', 'Guangzhou', 'Lanzhou', 'Ningbo', 'Shenzhen',
       'Tianjin'], dtype=object)
InĀ [52]:
# Scores By City
by_city_result = test_results_df.groupby('City').agg({
    'Dice Coefficient': ['mean', 'std'],
    'HD95': ['mean', 'std']
}).reset_index()

# Rename columns for clarity
by_city_result.columns = ['City', 'Dice Coefficient Mean', 'Dice Coefficient Std', 'HD95 Mean', 'HD95 Std']

# Add Row for Overall
overall_row = pd.DataFrame({
    'City': ['Overall'],
    'Dice Coefficient Mean': [test_results_df['Dice Coefficient'].mean()],
    'Dice Coefficient Std': [test_results_df['Dice Coefficient'].std()],
    'HD95 Mean': [test_results_df['HD95'].mean()],
    'HD95 Std': [test_results_df['HD95'].std()],
})

by_city_result = pd.concat([by_city_result, overall_row], ignore_index=True)

# Display the results as a rendered table
print("Scores by City:")
display(by_city_result)

# Histogram of Dice Coefficient by City
plt.figure(figsize=(10, 6))
sns.kdeplot(data=test_results_df, x='Dice Coefficient', hue='City')
plt.title('KDE Plot of Dice Coefficient by City')
plt.xlabel('Dice Coefficient')
plt.ylabel('Density')
plt.grid()
plt.tight_layout()
plt.show()

# Histogram of HD95 by City
plt.figure(figsize=(10, 6))
sns.kdeplot(data=test_results_df, x='HD95', hue='City')
plt.title('KDE Plot of HD95 by City')
plt.xlabel('HD95')
plt.ylabel('Density')
plt.grid()
plt.tight_layout()
plt.show()
Scores by City:
City Dice Coefficient Mean Dice Coefficient Std HD95 Mean HD95 Std
0 Chongqing 0.744319 0.222668 26.693538 59.478517
1 Guangzhou 0.722874 0.227968 29.421031 56.687649
2 Lanzhou 0.766654 0.249940 33.305769 74.004024
3 Ningbo 0.785484 0.146918 15.573055 23.415616
4 Shenzhen 0.742552 0.176550 23.645208 51.723232
5 Tianjin 0.752744 0.262597 34.507415 79.728446
6 Overall 0.751649 0.220762 27.417227 61.267373
No description has been provided for this image
No description has been provided for this image

First 5 and Last 5 Cases¶

InĀ [53]:
def plot_prediction(idx, title):
    # Load image, prediction, and ground truth from saved .nii.gz files
    img_path = os.path.join(test_save_path, f"{idx}_img.nii.gz")
    pred_path = os.path.join(test_save_path, f"{idx}_pred.nii.gz")
    gt_path = os.path.join(test_save_path, f"{idx}_gt.nii.gz")

    img = sitk.GetArrayFromImage(sitk.ReadImage(img_path))
    pred = sitk.GetArrayFromImage(sitk.ReadImage(pred_path))
    gt = sitk.GetArrayFromImage(sitk.ReadImage(gt_path))

    plt.figure(figsize=(15, 4))
    plt.suptitle(title, fontsize=16)
    plt.subplot(1, 3, 1)
    if img.shape[0] == 3:
        # Normalize to [0, 1] for RGB display
        img_disp = img.astype(np.float32)
        img_disp = (img_disp - img_disp.min()) / (img_disp.max() - img_disp.min() + 1e-8)
        plt.imshow(img_disp.transpose(1, 2, 0))
    else:
        plt.imshow(img[0], cmap='gray')
    plt.title("Image")
    plt.axis('off')

    plt.subplot(1, 3, 2)
    plt.imshow(pred, cmap='gray')
    plt.title("Prediction")
    plt.axis('off')

    plt.subplot(1, 3, 3)
    plt.imshow(gt, cmap='gray')
    plt.title("Ground Truth")
    plt.axis('off')
    
    # Add Dice Coefficient and HD95 to the title
    dice = test_results_df.iloc[idx]['Dice Coefficient']
    hd95 = test_results_df.iloc[idx]['HD95']
    City = test_results_df.iloc[idx]['City']
    
    plt.suptitle(f"{title} : {City} - Dice: {dice:.4f}, HD95: {hd95:.4f}", fontsize=16)
    # Increase padding between suptitle and subplots
    plt.subplots_adjust(top=0.85)
    plt.tight_layout()
    
    plt.show()
    
# Plot predictions for the first 5 cases
for i in range(5):
    plot_prediction(i, f"Prediction for Case {i}")
# Plot predictions for the last 5 cases
for i in range(len(test_results)-5, len(test_results)):
    plot_prediction(i, f"Prediction for Case {i}")
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Dice Score 1 vs 0¶

InĀ [54]:
# Best 5 and Worst 5 Cases images

# Get the indices of the best and worst 5 cases based on Dice Coefficient
best_5_cases = test_results_df['Dice Coefficient'].nlargest(5).index
worst_5_cases = test_results_df['Dice Coefficient'].nsmallest(5).index

# Plot the best 5 cases
print("Best 5 Cases (Highest Dice Coefficient):", best_5_cases.tolist())
for i in best_5_cases:
    plot_prediction(i, f"Best Case {i}")

# Plot the worst 5 cases
print("Worst 5 Cases (Lowest Dice Coefficient):", worst_5_cases.tolist())
for i in worst_5_cases:
    plot_prediction(i, f"Worst Case {i}")
Best 5 Cases (Highest Dice Coefficient): [6, 18, 23, 53, 58]
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Worst 5 Cases (Lowest Dice Coefficient): [13, 66, 76, 79, 85]
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Best and Worst Scores Ignoring 0 and 1 Scores¶

InĀ [55]:
# Best 5 and Worst 5 Cases images

# Get the indices of the best and worst 5 cases based on Dice Coefficient, ignoring perfect (1.0) and zero (0.0) scores
filtered_df = test_results_df[(test_results_df['Dice Coefficient'] < 1.0) & (test_results_df['Dice Coefficient'] > 0.0)]
best_5_cases = filtered_df['Dice Coefficient'].nlargest(5).index
worst_5_cases = filtered_df['Dice Coefficient'].nsmallest(5).index

# Plot the best 5 cases
print("Best 5 Cases (Highest Dice Coefficient):")
for i in best_5_cases:
    plot_prediction(i, f"Best Case {i}")

# Plot the worst 5 cases
print("Worst 5 Cases (Lowest Dice Coefficient):")
for i in worst_5_cases:
    plot_prediction(i, f"Worst Case {i}")
Best 5 Cases (Highest Dice Coefficient):
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Worst 5 Cases (Lowest Dice Coefficient):
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Model Structure¶

InĀ [41]:
from torchsummary import summary

# Visualize the VisionTransformer model structure
# Use a sample input size matching your data (e.g., 3x224x224 for RGB images)
summary(net, input_size=(3, args.img_size, args.img_size), device=str(device))


# from torchviz import make_dot
# import graphviz

# # Create a dummy input matching model's input shape
# dummy_input = torch.randn(1, 3, args.img_size, args.img_size).to(device)

# # Forward pass to get the output
# output = net(dummy_input)

# Create the visualization
#dot = make_dot(output, params=dict(net.named_parameters()))
#dot.render("model_visualization", format="pdf")
===========================================================================
Layer (type:depth-idx)                             Param #
===========================================================================
ā”œā”€Transformer: 1-1                                 --
|    └─Embeddings: 2-1                             --
|    |    └─ResNetV2: 3-1                          11,894,848
|    |    └─Conv2d: 3-2                            787,200
|    |    └─Dropout: 3-3                           --
|    └─Encoder: 2-2                                --
|    |    └─ModuleList: 3-4                        85,054,464
|    |    └─LayerNorm: 3-5                         1,536
ā”œā”€DecoderCup: 1-2                                  --
|    └─Conv2dReLU: 2-3                             --
|    |    └─Conv2d: 3-6                            3,538,944
|    |    └─BatchNorm2d: 3-7                       1,024
|    |    └─ReLU: 3-8                              --
|    └─ModuleList: 2-4                             --
|    |    └─DecoderBlock: 3-9                      2,950,144
|    |    └─DecoderBlock: 3-10                     737,792
|    |    └─DecoderBlock: 3-11                     147,712
|    |    └─DecoderBlock: 3-12                     11,584
ā”œā”€SegmentationHead: 1-3                            --
|    └─Conv2d: 2-5                                 290
|    └─Identity: 2-6                               --
===========================================================================
Total params: 105,125,538
Trainable params: 105,125,538
Non-trainable params: 0
===========================================================================
Out[41]:
===========================================================================
Layer (type:depth-idx)                             Param #
===========================================================================
ā”œā”€Transformer: 1-1                                 --
|    └─Embeddings: 2-1                             --
|    |    └─ResNetV2: 3-1                          11,894,848
|    |    └─Conv2d: 3-2                            787,200
|    |    └─Dropout: 3-3                           --
|    └─Encoder: 2-2                                --
|    |    └─ModuleList: 3-4                        85,054,464
|    |    └─LayerNorm: 3-5                         1,536
ā”œā”€DecoderCup: 1-2                                  --
|    └─Conv2dReLU: 2-3                             --
|    |    └─Conv2d: 3-6                            3,538,944
|    |    └─BatchNorm2d: 3-7                       1,024
|    |    └─ReLU: 3-8                              --
|    └─ModuleList: 2-4                             --
|    |    └─DecoderBlock: 3-9                      2,950,144
|    |    └─DecoderBlock: 3-10                     737,792
|    |    └─DecoderBlock: 3-11                     147,712
|    |    └─DecoderBlock: 3-12                     11,584
ā”œā”€SegmentationHead: 1-3                            --
|    └─Conv2d: 2-5                                 290
|    └─Identity: 2-6                               --
===========================================================================
Total params: 105,125,538
Trainable params: 105,125,538
Non-trainable params: 0
===========================================================================

Detailed Breakdown¶

InĀ [42]:
summary(net, input_size=(3, args.img_size, args.img_size), device=str(device), depth=6)
================================================================================
Layer (type:depth-idx)                                  Param #
================================================================================
ā”œā”€Transformer: 1-1                                      --
|    └─Embeddings: 2-1                                  --
|    |    └─ResNetV2: 3-1                               --
|    |    |    └─Sequential: 4-1                        --
|    |    |    |    └─StdConv2d: 5-1                    9,408
|    |    |    |    └─GroupNorm: 5-2                    128
|    |    |    |    └─ReLU: 5-3                         --
|    |    |    └─Sequential: 4-2                        --
|    |    |    |    └─Sequential: 5-4                   --
|    |    |    |    |    └─PreActBottleneck: 6-1        75,008
|    |    |    |    |    └─PreActBottleneck: 6-2        70,400
|    |    |    |    |    └─PreActBottleneck: 6-3        70,400
|    |    |    |    └─Sequential: 5-5                   --
|    |    |    |    |    └─PreActBottleneck: 6-4        379,392
|    |    |    |    |    └─PreActBottleneck: 6-5        280,064
|    |    |    |    |    └─PreActBottleneck: 6-6        280,064
|    |    |    |    |    └─PreActBottleneck: 6-7        280,064
|    |    |    |    └─Sequential: 5-6                   --
|    |    |    |    |    └─PreActBottleneck: 6-8        1,512,448
|    |    |    |    |    └─PreActBottleneck: 6-9        1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-10       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-11       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-12       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-13       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-14       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-15       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-16       1,117,184
|    |    └─Conv2d: 3-2                                 787,200
|    |    └─Dropout: 3-3                                --
|    └─Encoder: 2-2                                     --
|    |    └─ModuleList: 3-4                             --
|    |    |    └─Block: 4-3                             --
|    |    |    |    └─LayerNorm: 5-7                    1,536
|    |    |    |    └─LayerNorm: 5-8                    1,536
|    |    |    |    └─Mlp: 5-9                          --
|    |    |    |    |    └─Linear: 6-17                 2,362,368
|    |    |    |    |    └─Linear: 6-18                 2,360,064
|    |    |    |    |    └─Dropout: 6-19                --
|    |    |    |    └─Attention: 5-10                   --
|    |    |    |    |    └─Linear: 6-20                 590,592
|    |    |    |    |    └─Linear: 6-21                 590,592
|    |    |    |    |    └─Linear: 6-22                 590,592
|    |    |    |    |    └─Linear: 6-23                 590,592
|    |    |    |    |    └─Dropout: 6-24                --
|    |    |    |    |    └─Dropout: 6-25                --
|    |    |    |    |    └─Softmax: 6-26                --
|    |    |    └─Block: 4-4                             --
|    |    |    |    └─LayerNorm: 5-11                   1,536
|    |    |    |    └─LayerNorm: 5-12                   1,536
|    |    |    |    └─Mlp: 5-13                         --
|    |    |    |    |    └─Linear: 6-27                 2,362,368
|    |    |    |    |    └─Linear: 6-28                 2,360,064
|    |    |    |    |    └─Dropout: 6-29                --
|    |    |    |    └─Attention: 5-14                   --
|    |    |    |    |    └─Linear: 6-30                 590,592
|    |    |    |    |    └─Linear: 6-31                 590,592
|    |    |    |    |    └─Linear: 6-32                 590,592
|    |    |    |    |    └─Linear: 6-33                 590,592
|    |    |    |    |    └─Dropout: 6-34                --
|    |    |    |    |    └─Dropout: 6-35                --
|    |    |    |    |    └─Softmax: 6-36                --
|    |    |    └─Block: 4-5                             --
|    |    |    |    └─LayerNorm: 5-15                   1,536
|    |    |    |    └─LayerNorm: 5-16                   1,536
|    |    |    |    └─Mlp: 5-17                         --
|    |    |    |    |    └─Linear: 6-37                 2,362,368
|    |    |    |    |    └─Linear: 6-38                 2,360,064
|    |    |    |    |    └─Dropout: 6-39                --
|    |    |    |    └─Attention: 5-18                   --
|    |    |    |    |    └─Linear: 6-40                 590,592
|    |    |    |    |    └─Linear: 6-41                 590,592
|    |    |    |    |    └─Linear: 6-42                 590,592
|    |    |    |    |    └─Linear: 6-43                 590,592
|    |    |    |    |    └─Dropout: 6-44                --
|    |    |    |    |    └─Dropout: 6-45                --
|    |    |    |    |    └─Softmax: 6-46                --
|    |    |    └─Block: 4-6                             --
|    |    |    |    └─LayerNorm: 5-19                   1,536
|    |    |    |    └─LayerNorm: 5-20                   1,536
|    |    |    |    └─Mlp: 5-21                         --
|    |    |    |    |    └─Linear: 6-47                 2,362,368
|    |    |    |    |    └─Linear: 6-48                 2,360,064
|    |    |    |    |    └─Dropout: 6-49                --
|    |    |    |    └─Attention: 5-22                   --
|    |    |    |    |    └─Linear: 6-50                 590,592
|    |    |    |    |    └─Linear: 6-51                 590,592
|    |    |    |    |    └─Linear: 6-52                 590,592
|    |    |    |    |    └─Linear: 6-53                 590,592
|    |    |    |    |    └─Dropout: 6-54                --
|    |    |    |    |    └─Dropout: 6-55                --
|    |    |    |    |    └─Softmax: 6-56                --
|    |    |    └─Block: 4-7                             --
|    |    |    |    └─LayerNorm: 5-23                   1,536
|    |    |    |    └─LayerNorm: 5-24                   1,536
|    |    |    |    └─Mlp: 5-25                         --
|    |    |    |    |    └─Linear: 6-57                 2,362,368
|    |    |    |    |    └─Linear: 6-58                 2,360,064
|    |    |    |    |    └─Dropout: 6-59                --
|    |    |    |    └─Attention: 5-26                   --
|    |    |    |    |    └─Linear: 6-60                 590,592
|    |    |    |    |    └─Linear: 6-61                 590,592
|    |    |    |    |    └─Linear: 6-62                 590,592
|    |    |    |    |    └─Linear: 6-63                 590,592
|    |    |    |    |    └─Dropout: 6-64                --
|    |    |    |    |    └─Dropout: 6-65                --
|    |    |    |    |    └─Softmax: 6-66                --
|    |    |    └─Block: 4-8                             --
|    |    |    |    └─LayerNorm: 5-27                   1,536
|    |    |    |    └─LayerNorm: 5-28                   1,536
|    |    |    |    └─Mlp: 5-29                         --
|    |    |    |    |    └─Linear: 6-67                 2,362,368
|    |    |    |    |    └─Linear: 6-68                 2,360,064
|    |    |    |    |    └─Dropout: 6-69                --
|    |    |    |    └─Attention: 5-30                   --
|    |    |    |    |    └─Linear: 6-70                 590,592
|    |    |    |    |    └─Linear: 6-71                 590,592
|    |    |    |    |    └─Linear: 6-72                 590,592
|    |    |    |    |    └─Linear: 6-73                 590,592
|    |    |    |    |    └─Dropout: 6-74                --
|    |    |    |    |    └─Dropout: 6-75                --
|    |    |    |    |    └─Softmax: 6-76                --
|    |    |    └─Block: 4-9                             --
|    |    |    |    └─LayerNorm: 5-31                   1,536
|    |    |    |    └─LayerNorm: 5-32                   1,536
|    |    |    |    └─Mlp: 5-33                         --
|    |    |    |    |    └─Linear: 6-77                 2,362,368
|    |    |    |    |    └─Linear: 6-78                 2,360,064
|    |    |    |    |    └─Dropout: 6-79                --
|    |    |    |    └─Attention: 5-34                   --
|    |    |    |    |    └─Linear: 6-80                 590,592
|    |    |    |    |    └─Linear: 6-81                 590,592
|    |    |    |    |    └─Linear: 6-82                 590,592
|    |    |    |    |    └─Linear: 6-83                 590,592
|    |    |    |    |    └─Dropout: 6-84                --
|    |    |    |    |    └─Dropout: 6-85                --
|    |    |    |    |    └─Softmax: 6-86                --
|    |    |    └─Block: 4-10                            --
|    |    |    |    └─LayerNorm: 5-35                   1,536
|    |    |    |    └─LayerNorm: 5-36                   1,536
|    |    |    |    └─Mlp: 5-37                         --
|    |    |    |    |    └─Linear: 6-87                 2,362,368
|    |    |    |    |    └─Linear: 6-88                 2,360,064
|    |    |    |    |    └─Dropout: 6-89                --
|    |    |    |    └─Attention: 5-38                   --
|    |    |    |    |    └─Linear: 6-90                 590,592
|    |    |    |    |    └─Linear: 6-91                 590,592
|    |    |    |    |    └─Linear: 6-92                 590,592
|    |    |    |    |    └─Linear: 6-93                 590,592
|    |    |    |    |    └─Dropout: 6-94                --
|    |    |    |    |    └─Dropout: 6-95                --
|    |    |    |    |    └─Softmax: 6-96                --
|    |    |    └─Block: 4-11                            --
|    |    |    |    └─LayerNorm: 5-39                   1,536
|    |    |    |    └─LayerNorm: 5-40                   1,536
|    |    |    |    └─Mlp: 5-41                         --
|    |    |    |    |    └─Linear: 6-97                 2,362,368
|    |    |    |    |    └─Linear: 6-98                 2,360,064
|    |    |    |    |    └─Dropout: 6-99                --
|    |    |    |    └─Attention: 5-42                   --
|    |    |    |    |    └─Linear: 6-100                590,592
|    |    |    |    |    └─Linear: 6-101                590,592
|    |    |    |    |    └─Linear: 6-102                590,592
|    |    |    |    |    └─Linear: 6-103                590,592
|    |    |    |    |    └─Dropout: 6-104               --
|    |    |    |    |    └─Dropout: 6-105               --
|    |    |    |    |    └─Softmax: 6-106               --
|    |    |    └─Block: 4-12                            --
|    |    |    |    └─LayerNorm: 5-43                   1,536
|    |    |    |    └─LayerNorm: 5-44                   1,536
|    |    |    |    └─Mlp: 5-45                         --
|    |    |    |    |    └─Linear: 6-107                2,362,368
|    |    |    |    |    └─Linear: 6-108                2,360,064
|    |    |    |    |    └─Dropout: 6-109               --
|    |    |    |    └─Attention: 5-46                   --
|    |    |    |    |    └─Linear: 6-110                590,592
|    |    |    |    |    └─Linear: 6-111                590,592
|    |    |    |    |    └─Linear: 6-112                590,592
|    |    |    |    |    └─Linear: 6-113                590,592
|    |    |    |    |    └─Dropout: 6-114               --
|    |    |    |    |    └─Dropout: 6-115               --
|    |    |    |    |    └─Softmax: 6-116               --
|    |    |    └─Block: 4-13                            --
|    |    |    |    └─LayerNorm: 5-47                   1,536
|    |    |    |    └─LayerNorm: 5-48                   1,536
|    |    |    |    └─Mlp: 5-49                         --
|    |    |    |    |    └─Linear: 6-117                2,362,368
|    |    |    |    |    └─Linear: 6-118                2,360,064
|    |    |    |    |    └─Dropout: 6-119               --
|    |    |    |    └─Attention: 5-50                   --
|    |    |    |    |    └─Linear: 6-120                590,592
|    |    |    |    |    └─Linear: 6-121                590,592
|    |    |    |    |    └─Linear: 6-122                590,592
|    |    |    |    |    └─Linear: 6-123                590,592
|    |    |    |    |    └─Dropout: 6-124               --
|    |    |    |    |    └─Dropout: 6-125               --
|    |    |    |    |    └─Softmax: 6-126               --
|    |    |    └─Block: 4-14                            --
|    |    |    |    └─LayerNorm: 5-51                   1,536
|    |    |    |    └─LayerNorm: 5-52                   1,536
|    |    |    |    └─Mlp: 5-53                         --
|    |    |    |    |    └─Linear: 6-127                2,362,368
|    |    |    |    |    └─Linear: 6-128                2,360,064
|    |    |    |    |    └─Dropout: 6-129               --
|    |    |    |    └─Attention: 5-54                   --
|    |    |    |    |    └─Linear: 6-130                590,592
|    |    |    |    |    └─Linear: 6-131                590,592
|    |    |    |    |    └─Linear: 6-132                590,592
|    |    |    |    |    └─Linear: 6-133                590,592
|    |    |    |    |    └─Dropout: 6-134               --
|    |    |    |    |    └─Dropout: 6-135               --
|    |    |    |    |    └─Softmax: 6-136               --
|    |    └─LayerNorm: 3-5                              1,536
ā”œā”€DecoderCup: 1-2                                       --
|    └─Conv2dReLU: 2-3                                  --
|    |    └─Conv2d: 3-6                                 3,538,944
|    |    └─BatchNorm2d: 3-7                            1,024
|    |    └─ReLU: 3-8                                   --
|    └─ModuleList: 2-4                                  --
|    |    └─DecoderBlock: 3-9                           --
|    |    |    └─Conv2dReLU: 4-15                       --
|    |    |    |    └─Conv2d: 5-55                      2,359,296
|    |    |    |    └─BatchNorm2d: 5-56                 512
|    |    |    |    └─ReLU: 5-57                        --
|    |    |    └─Conv2dReLU: 4-16                       --
|    |    |    |    └─Conv2d: 5-58                      589,824
|    |    |    |    └─BatchNorm2d: 5-59                 512
|    |    |    |    └─ReLU: 5-60                        --
|    |    |    └─UpsamplingBilinear2d: 4-17             --
|    |    └─DecoderBlock: 3-10                          --
|    |    |    └─Conv2dReLU: 4-18                       --
|    |    |    |    └─Conv2d: 5-61                      589,824
|    |    |    |    └─BatchNorm2d: 5-62                 256
|    |    |    |    └─ReLU: 5-63                        --
|    |    |    └─Conv2dReLU: 4-19                       --
|    |    |    |    └─Conv2d: 5-64                      147,456
|    |    |    |    └─BatchNorm2d: 5-65                 256
|    |    |    |    └─ReLU: 5-66                        --
|    |    |    └─UpsamplingBilinear2d: 4-20             --
|    |    └─DecoderBlock: 3-11                          --
|    |    |    └─Conv2dReLU: 4-21                       --
|    |    |    |    └─Conv2d: 5-67                      110,592
|    |    |    |    └─BatchNorm2d: 5-68                 128
|    |    |    |    └─ReLU: 5-69                        --
|    |    |    └─Conv2dReLU: 4-22                       --
|    |    |    |    └─Conv2d: 5-70                      36,864
|    |    |    |    └─BatchNorm2d: 5-71                 128
|    |    |    |    └─ReLU: 5-72                        --
|    |    |    └─UpsamplingBilinear2d: 4-23             --
|    |    └─DecoderBlock: 3-12                          --
|    |    |    └─Conv2dReLU: 4-24                       --
|    |    |    |    └─Conv2d: 5-73                      9,216
|    |    |    |    └─BatchNorm2d: 5-74                 32
|    |    |    |    └─ReLU: 5-75                        --
|    |    |    └─Conv2dReLU: 4-25                       --
|    |    |    |    └─Conv2d: 5-76                      2,304
|    |    |    |    └─BatchNorm2d: 5-77                 32
|    |    |    |    └─ReLU: 5-78                        --
|    |    |    └─UpsamplingBilinear2d: 4-26             --
ā”œā”€SegmentationHead: 1-3                                 --
|    └─Conv2d: 2-5                                      290
|    └─Identity: 2-6                                    --
================================================================================
Total params: 105,125,538
Trainable params: 105,125,538
Non-trainable params: 0
================================================================================
Out[42]:
================================================================================
Layer (type:depth-idx)                                  Param #
================================================================================
ā”œā”€Transformer: 1-1                                      --
|    └─Embeddings: 2-1                                  --
|    |    └─ResNetV2: 3-1                               --
|    |    |    └─Sequential: 4-1                        --
|    |    |    |    └─StdConv2d: 5-1                    9,408
|    |    |    |    └─GroupNorm: 5-2                    128
|    |    |    |    └─ReLU: 5-3                         --
|    |    |    └─Sequential: 4-2                        --
|    |    |    |    └─Sequential: 5-4                   --
|    |    |    |    |    └─PreActBottleneck: 6-1        75,008
|    |    |    |    |    └─PreActBottleneck: 6-2        70,400
|    |    |    |    |    └─PreActBottleneck: 6-3        70,400
|    |    |    |    └─Sequential: 5-5                   --
|    |    |    |    |    └─PreActBottleneck: 6-4        379,392
|    |    |    |    |    └─PreActBottleneck: 6-5        280,064
|    |    |    |    |    └─PreActBottleneck: 6-6        280,064
|    |    |    |    |    └─PreActBottleneck: 6-7        280,064
|    |    |    |    └─Sequential: 5-6                   --
|    |    |    |    |    └─PreActBottleneck: 6-8        1,512,448
|    |    |    |    |    └─PreActBottleneck: 6-9        1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-10       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-11       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-12       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-13       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-14       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-15       1,117,184
|    |    |    |    |    └─PreActBottleneck: 6-16       1,117,184
|    |    └─Conv2d: 3-2                                 787,200
|    |    └─Dropout: 3-3                                --
|    └─Encoder: 2-2                                     --
|    |    └─ModuleList: 3-4                             --
|    |    |    └─Block: 4-3                             --
|    |    |    |    └─LayerNorm: 5-7                    1,536
|    |    |    |    └─LayerNorm: 5-8                    1,536
|    |    |    |    └─Mlp: 5-9                          --
|    |    |    |    |    └─Linear: 6-17                 2,362,368
|    |    |    |    |    └─Linear: 6-18                 2,360,064
|    |    |    |    |    └─Dropout: 6-19                --
|    |    |    |    └─Attention: 5-10                   --
|    |    |    |    |    └─Linear: 6-20                 590,592
|    |    |    |    |    └─Linear: 6-21                 590,592
|    |    |    |    |    └─Linear: 6-22                 590,592
|    |    |    |    |    └─Linear: 6-23                 590,592
|    |    |    |    |    └─Dropout: 6-24                --
|    |    |    |    |    └─Dropout: 6-25                --
|    |    |    |    |    └─Softmax: 6-26                --
|    |    |    └─Block: 4-4                             --
|    |    |    |    └─LayerNorm: 5-11                   1,536
|    |    |    |    └─LayerNorm: 5-12                   1,536
|    |    |    |    └─Mlp: 5-13                         --
|    |    |    |    |    └─Linear: 6-27                 2,362,368
|    |    |    |    |    └─Linear: 6-28                 2,360,064
|    |    |    |    |    └─Dropout: 6-29                --
|    |    |    |    └─Attention: 5-14                   --
|    |    |    |    |    └─Linear: 6-30                 590,592
|    |    |    |    |    └─Linear: 6-31                 590,592
|    |    |    |    |    └─Linear: 6-32                 590,592
|    |    |    |    |    └─Linear: 6-33                 590,592
|    |    |    |    |    └─Dropout: 6-34                --
|    |    |    |    |    └─Dropout: 6-35                --
|    |    |    |    |    └─Softmax: 6-36                --
|    |    |    └─Block: 4-5                             --
|    |    |    |    └─LayerNorm: 5-15                   1,536
|    |    |    |    └─LayerNorm: 5-16                   1,536
|    |    |    |    └─Mlp: 5-17                         --
|    |    |    |    |    └─Linear: 6-37                 2,362,368
|    |    |    |    |    └─Linear: 6-38                 2,360,064
|    |    |    |    |    └─Dropout: 6-39                --
|    |    |    |    └─Attention: 5-18                   --
|    |    |    |    |    └─Linear: 6-40                 590,592
|    |    |    |    |    └─Linear: 6-41                 590,592
|    |    |    |    |    └─Linear: 6-42                 590,592
|    |    |    |    |    └─Linear: 6-43                 590,592
|    |    |    |    |    └─Dropout: 6-44                --
|    |    |    |    |    └─Dropout: 6-45                --
|    |    |    |    |    └─Softmax: 6-46                --
|    |    |    └─Block: 4-6                             --
|    |    |    |    └─LayerNorm: 5-19                   1,536
|    |    |    |    └─LayerNorm: 5-20                   1,536
|    |    |    |    └─Mlp: 5-21                         --
|    |    |    |    |    └─Linear: 6-47                 2,362,368
|    |    |    |    |    └─Linear: 6-48                 2,360,064
|    |    |    |    |    └─Dropout: 6-49                --
|    |    |    |    └─Attention: 5-22                   --
|    |    |    |    |    └─Linear: 6-50                 590,592
|    |    |    |    |    └─Linear: 6-51                 590,592
|    |    |    |    |    └─Linear: 6-52                 590,592
|    |    |    |    |    └─Linear: 6-53                 590,592
|    |    |    |    |    └─Dropout: 6-54                --
|    |    |    |    |    └─Dropout: 6-55                --
|    |    |    |    |    └─Softmax: 6-56                --
|    |    |    └─Block: 4-7                             --
|    |    |    |    └─LayerNorm: 5-23                   1,536
|    |    |    |    └─LayerNorm: 5-24                   1,536
|    |    |    |    └─Mlp: 5-25                         --
|    |    |    |    |    └─Linear: 6-57                 2,362,368
|    |    |    |    |    └─Linear: 6-58                 2,360,064
|    |    |    |    |    └─Dropout: 6-59                --
|    |    |    |    └─Attention: 5-26                   --
|    |    |    |    |    └─Linear: 6-60                 590,592
|    |    |    |    |    └─Linear: 6-61                 590,592
|    |    |    |    |    └─Linear: 6-62                 590,592
|    |    |    |    |    └─Linear: 6-63                 590,592
|    |    |    |    |    └─Dropout: 6-64                --
|    |    |    |    |    └─Dropout: 6-65                --
|    |    |    |    |    └─Softmax: 6-66                --
|    |    |    └─Block: 4-8                             --
|    |    |    |    └─LayerNorm: 5-27                   1,536
|    |    |    |    └─LayerNorm: 5-28                   1,536
|    |    |    |    └─Mlp: 5-29                         --
|    |    |    |    |    └─Linear: 6-67                 2,362,368
|    |    |    |    |    └─Linear: 6-68                 2,360,064
|    |    |    |    |    └─Dropout: 6-69                --
|    |    |    |    └─Attention: 5-30                   --
|    |    |    |    |    └─Linear: 6-70                 590,592
|    |    |    |    |    └─Linear: 6-71                 590,592
|    |    |    |    |    └─Linear: 6-72                 590,592
|    |    |    |    |    └─Linear: 6-73                 590,592
|    |    |    |    |    └─Dropout: 6-74                --
|    |    |    |    |    └─Dropout: 6-75                --
|    |    |    |    |    └─Softmax: 6-76                --
|    |    |    └─Block: 4-9                             --
|    |    |    |    └─LayerNorm: 5-31                   1,536
|    |    |    |    └─LayerNorm: 5-32                   1,536
|    |    |    |    └─Mlp: 5-33                         --
|    |    |    |    |    └─Linear: 6-77                 2,362,368
|    |    |    |    |    └─Linear: 6-78                 2,360,064
|    |    |    |    |    └─Dropout: 6-79                --
|    |    |    |    └─Attention: 5-34                   --
|    |    |    |    |    └─Linear: 6-80                 590,592
|    |    |    |    |    └─Linear: 6-81                 590,592
|    |    |    |    |    └─Linear: 6-82                 590,592
|    |    |    |    |    └─Linear: 6-83                 590,592
|    |    |    |    |    └─Dropout: 6-84                --
|    |    |    |    |    └─Dropout: 6-85                --
|    |    |    |    |    └─Softmax: 6-86                --
|    |    |    └─Block: 4-10                            --
|    |    |    |    └─LayerNorm: 5-35                   1,536
|    |    |    |    └─LayerNorm: 5-36                   1,536
|    |    |    |    └─Mlp: 5-37                         --
|    |    |    |    |    └─Linear: 6-87                 2,362,368
|    |    |    |    |    └─Linear: 6-88                 2,360,064
|    |    |    |    |    └─Dropout: 6-89                --
|    |    |    |    └─Attention: 5-38                   --
|    |    |    |    |    └─Linear: 6-90                 590,592
|    |    |    |    |    └─Linear: 6-91                 590,592
|    |    |    |    |    └─Linear: 6-92                 590,592
|    |    |    |    |    └─Linear: 6-93                 590,592
|    |    |    |    |    └─Dropout: 6-94                --
|    |    |    |    |    └─Dropout: 6-95                --
|    |    |    |    |    └─Softmax: 6-96                --
|    |    |    └─Block: 4-11                            --
|    |    |    |    └─LayerNorm: 5-39                   1,536
|    |    |    |    └─LayerNorm: 5-40                   1,536
|    |    |    |    └─Mlp: 5-41                         --
|    |    |    |    |    └─Linear: 6-97                 2,362,368
|    |    |    |    |    └─Linear: 6-98                 2,360,064
|    |    |    |    |    └─Dropout: 6-99                --
|    |    |    |    └─Attention: 5-42                   --
|    |    |    |    |    └─Linear: 6-100                590,592
|    |    |    |    |    └─Linear: 6-101                590,592
|    |    |    |    |    └─Linear: 6-102                590,592
|    |    |    |    |    └─Linear: 6-103                590,592
|    |    |    |    |    └─Dropout: 6-104               --
|    |    |    |    |    └─Dropout: 6-105               --
|    |    |    |    |    └─Softmax: 6-106               --
|    |    |    └─Block: 4-12                            --
|    |    |    |    └─LayerNorm: 5-43                   1,536
|    |    |    |    └─LayerNorm: 5-44                   1,536
|    |    |    |    └─Mlp: 5-45                         --
|    |    |    |    |    └─Linear: 6-107                2,362,368
|    |    |    |    |    └─Linear: 6-108                2,360,064
|    |    |    |    |    └─Dropout: 6-109               --
|    |    |    |    └─Attention: 5-46                   --
|    |    |    |    |    └─Linear: 6-110                590,592
|    |    |    |    |    └─Linear: 6-111                590,592
|    |    |    |    |    └─Linear: 6-112                590,592
|    |    |    |    |    └─Linear: 6-113                590,592
|    |    |    |    |    └─Dropout: 6-114               --
|    |    |    |    |    └─Dropout: 6-115               --
|    |    |    |    |    └─Softmax: 6-116               --
|    |    |    └─Block: 4-13                            --
|    |    |    |    └─LayerNorm: 5-47                   1,536
|    |    |    |    └─LayerNorm: 5-48                   1,536
|    |    |    |    └─Mlp: 5-49                         --
|    |    |    |    |    └─Linear: 6-117                2,362,368
|    |    |    |    |    └─Linear: 6-118                2,360,064
|    |    |    |    |    └─Dropout: 6-119               --
|    |    |    |    └─Attention: 5-50                   --
|    |    |    |    |    └─Linear: 6-120                590,592
|    |    |    |    |    └─Linear: 6-121                590,592
|    |    |    |    |    └─Linear: 6-122                590,592
|    |    |    |    |    └─Linear: 6-123                590,592
|    |    |    |    |    └─Dropout: 6-124               --
|    |    |    |    |    └─Dropout: 6-125               --
|    |    |    |    |    └─Softmax: 6-126               --
|    |    |    └─Block: 4-14                            --
|    |    |    |    └─LayerNorm: 5-51                   1,536
|    |    |    |    └─LayerNorm: 5-52                   1,536
|    |    |    |    └─Mlp: 5-53                         --
|    |    |    |    |    └─Linear: 6-127                2,362,368
|    |    |    |    |    └─Linear: 6-128                2,360,064
|    |    |    |    |    └─Dropout: 6-129               --
|    |    |    |    └─Attention: 5-54                   --
|    |    |    |    |    └─Linear: 6-130                590,592
|    |    |    |    |    └─Linear: 6-131                590,592
|    |    |    |    |    └─Linear: 6-132                590,592
|    |    |    |    |    └─Linear: 6-133                590,592
|    |    |    |    |    └─Dropout: 6-134               --
|    |    |    |    |    └─Dropout: 6-135               --
|    |    |    |    |    └─Softmax: 6-136               --
|    |    └─LayerNorm: 3-5                              1,536
ā”œā”€DecoderCup: 1-2                                       --
|    └─Conv2dReLU: 2-3                                  --
|    |    └─Conv2d: 3-6                                 3,538,944
|    |    └─BatchNorm2d: 3-7                            1,024
|    |    └─ReLU: 3-8                                   --
|    └─ModuleList: 2-4                                  --
|    |    └─DecoderBlock: 3-9                           --
|    |    |    └─Conv2dReLU: 4-15                       --
|    |    |    |    └─Conv2d: 5-55                      2,359,296
|    |    |    |    └─BatchNorm2d: 5-56                 512
|    |    |    |    └─ReLU: 5-57                        --
|    |    |    └─Conv2dReLU: 4-16                       --
|    |    |    |    └─Conv2d: 5-58                      589,824
|    |    |    |    └─BatchNorm2d: 5-59                 512
|    |    |    |    └─ReLU: 5-60                        --
|    |    |    └─UpsamplingBilinear2d: 4-17             --
|    |    └─DecoderBlock: 3-10                          --
|    |    |    └─Conv2dReLU: 4-18                       --
|    |    |    |    └─Conv2d: 5-61                      589,824
|    |    |    |    └─BatchNorm2d: 5-62                 256
|    |    |    |    └─ReLU: 5-63                        --
|    |    |    └─Conv2dReLU: 4-19                       --
|    |    |    |    └─Conv2d: 5-64                      147,456
|    |    |    |    └─BatchNorm2d: 5-65                 256
|    |    |    |    └─ReLU: 5-66                        --
|    |    |    └─UpsamplingBilinear2d: 4-20             --
|    |    └─DecoderBlock: 3-11                          --
|    |    |    └─Conv2dReLU: 4-21                       --
|    |    |    |    └─Conv2d: 5-67                      110,592
|    |    |    |    └─BatchNorm2d: 5-68                 128
|    |    |    |    └─ReLU: 5-69                        --
|    |    |    └─Conv2dReLU: 4-22                       --
|    |    |    |    └─Conv2d: 5-70                      36,864
|    |    |    |    └─BatchNorm2d: 5-71                 128
|    |    |    |    └─ReLU: 5-72                        --
|    |    |    └─UpsamplingBilinear2d: 4-23             --
|    |    └─DecoderBlock: 3-12                          --
|    |    |    └─Conv2dReLU: 4-24                       --
|    |    |    |    └─Conv2d: 5-73                      9,216
|    |    |    |    └─BatchNorm2d: 5-74                 32
|    |    |    |    └─ReLU: 5-75                        --
|    |    |    └─Conv2dReLU: 4-25                       --
|    |    |    |    └─Conv2d: 5-76                      2,304
|    |    |    |    └─BatchNorm2d: 5-77                 32
|    |    |    |    └─ReLU: 5-78                        --
|    |    |    └─UpsamplingBilinear2d: 4-26             --
ā”œā”€SegmentationHead: 1-3                                 --
|    └─Conv2d: 2-5                                      290
|    └─Identity: 2-6                                    --
================================================================================
Total params: 105,125,538
Trainable params: 105,125,538
Non-trainable params: 0
================================================================================

Testing Multiple Models¶

InĀ [43]:
import glob
import os


def test_all_models_in_directory(model_directory, test_args):
    """
    Automatically test all .pth files in a directory
    
    Args:
        model_directory: Path to directory containing .pth files
        test_args: Test arguments
    """
    
    # Find all .pth files in the directory
    model_files = glob.glob(os.path.join(model_directory, "*.pth"))
    
    if not model_files:
        print(f"No .pth files found in {model_directory}")
        return {}
    
    print(f"Found {len(model_files)} model files:")
    for file in model_files:
        print(f"  - {os.path.basename(file)}")
    
    all_results = {}
    
    for model_path in model_files:
        model_name = os.path.basename(model_path).replace('.pth', '')
        print(f"\n{'='*50}")
        print(f"Testing Model: {model_name}")
        print(f"{'='*50}")
        
        # Create model
        config_vit = CONFIGS[test_args.vit_name]
        config_vit.n_classes = test_args.num_classes
        config_vit.n_skip = test_args.n_skip
        config_vit.patches.size = (test_args.vit_patches_size, test_args.vit_patches_size)
        
        if 'R50' in test_args.vit_name:
            grid_size = int(test_args.img_size / test_args.vit_patches_size)
            config_vit.patches.grid = (grid_size, grid_size)
        
        net = VisionTransformer(config_vit, img_size=test_args.img_size, 
                               num_classes=config_vit.n_classes).to(device)
        
        # Load model weights
        try:
            print(f"Loading model from: {model_path}")
            net.load_state_dict(torch.load(model_path))
            
            # Run inference
            test_results = run_inference_simple(test_args, net)
            all_results[model_name] = test_results
            
        # print Model stats
            
            print(f"\n{'='*50}")
            print(f"Results For Model: {model_name}")
            
            dice_scores = [r['dice'] for r in test_results]
            print(f"{model_name}: Mean Dice = {np.mean(dice_scores):.4f}, Std = {np.std(dice_scores):.4f}")
            print(f"{model_name}: Median Dice = {np.median(dice_scores):.4f}, Std = {np.std(dice_scores):.4f}")
            print(f"\n{'-'*50}")
            # HD95 scores
            hd95_scores = [r['hd95'] for r in test_results]
            print(f"{model_name}: Mean HD95 = {np.mean(hd95_scores):.4f}, Std = {np.std(hd95_scores):.4f}")
            print(f"{model_name}: Median HD95 = {np.median(hd95_scores):.4f}, Std = {np.std(hd95_scores):.4f}")
            print(f"{'='*50}")

            
            # Clean up GPU memory
            del net
            torch.cuda.empty_cache()
            
        except Exception as e:
            print(f"Error loading model {model_path}: {e}")
            all_results[model_name] = None
    
    return all_results

def run_inference_simple(args, model):
    """Simplified inference without saving files"""
    print("Starting inference...")
    
    # Create test dataset
    db_test = GF7Dataset(
        image_dir=args.image_dir,
        mask_dir=args.mask_dir,
        image_size=args.img_size,
        transform=None
    )
    
    testloader = DataLoader(db_test, batch_size=1, shuffle=False, num_workers=0)
    print(f"Testing on {len(db_test)} samples")
    
    model.eval()
    all_metrics = []
    
    # Maximum possible distance in a 224x224 image
    max_distance = np.sqrt(224**2 + 224**2)  # ā‰ˆ 316.8
    
    with torch.no_grad():
        for i, (image_batch, label_batch) in enumerate(tqdm(testloader, desc="Testing")):
            # Move to device
            image_batch = image_batch.to(device).float()
            label_np = label_batch.squeeze().cpu().numpy()
            
            # Forward pass
            outputs = model(image_batch)
            pred_np = torch.argmax(torch.softmax(outputs, dim=1), dim=1).squeeze().cpu().numpy()
            
            # Calculate metrics for foreground class - FIXED LOGIC
            if pred_np.sum() > 0 and label_np.sum() > 0:
                dice = metric.binary.dc(pred_np, label_np)
                hd95 = metric.binary.hd95(pred_np, label_np)
            elif pred_np.sum() > 0 and label_np.sum() == 0:
                # False positives: predicted buildings where none exist
                dice, hd95 = 0, max_distance
            elif pred_np.sum() == 0 and label_np.sum() > 0:
                # False negatives: missed buildings that should exist
                dice, hd95 = 0, max_distance
            else:
                # Both empty - perfect agreement
                dice, hd95 = 1, 0
            
            all_metrics.append({
                'filename': os.path.basename(db_test.image_paths[i]),
                'dice': dice,
                'hd95': hd95
            })
    
    return all_metrics

def compare_all_models(all_results):
    """Simple comparison of all model results"""
    import pandas as pd
    import matplotlib.pyplot as plt
    
    comparison_data = []
    
    for model_name, results in all_results.items():
        if results is not None:
            dice_scores = [r['dice'] for r in results]
            hd95_scores = [r['hd95'] for r in results]
            
            comparison_data.append({
                'Model': model_name,
                'Mean Dice': np.mean(dice_scores),
                'Std Dice': np.std(dice_scores),
                'Mean HD95': np.mean(hd95_scores),
                'Std HD95': np.std(hd95_scores),
                'Median Dice': np.median(dice_scores),
                'Median HD95': np.median(hd95_scores),
                'Max Dice': np.max(dice_scores),
                'Min Dice': np.min(dice_scores)
            })
    
    # Create comparison DataFrame
    comparison_df = pd.DataFrame(comparison_data)
    
    # Sort by Mean Dice descending
    comparison_df = comparison_df.sort_values('Mean Dice', ascending=False)
    
    print("\nModel Comparison Results (Sorted by Mean Dice):")
    print("="*100)
    display(comparison_df)
    
    # Plot comparison
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Dice Coefficient comparison
    ax1.bar(range(len(comparison_df)), comparison_df['Mean Dice'], 
            yerr=comparison_df['Std Dice'], capsize=5)
    ax1.set_title('Mean Dice Coefficient Comparison')
    ax1.set_ylabel('Dice Coefficient')
    ax1.set_xticks(range(len(comparison_df)))
    ax1.set_xticklabels(comparison_df['Model'], rotation=45, ha='right')
    ax1.grid(True, alpha=0.3)
    
    # HD95 comparison
    ax2.bar(range(len(comparison_df)), comparison_df['Mean HD95'], 
            yerr=comparison_df['Std HD95'], capsize=5)
    ax2.set_title('Mean HD95 Comparison')
    ax2.set_ylabel('HD95')
    ax2.set_xticks(range(len(comparison_df)))
    ax2.set_xticklabels(comparison_df['Model'], rotation=45, ha='right')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return comparison_df

model_directory = "model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42"
test_args = args 


# Run comparison on all models in the directory
print("Testing all models in directory...")
results = test_all_models_in_directory(model_directory, test_args)
comparison = compare_all_models(results)

# Show the best performing model
if not comparison.empty:
    best_model = comparison.iloc[0]
    print(f"\nBest performing model: {best_model['Model']}")
    print(f"Mean Dice: {best_model['Mean Dice']:.4f}")
    print(f"Mean HD95: {best_model['Mean HD95']:.4f}")
Testing all models in directory...
Found 31 model files:
  - epoch_104_iter_13125.pth
  - epoch_109_iter_13750.pth
  - epoch_114_iter_14375.pth
  - epoch_119_iter_15000.pth
  - epoch_124_iter_15625.pth
  - epoch_129_iter_16250.pth
  - epoch_134_iter_16875.pth
  - epoch_139_iter_17500.pth
  - epoch_144_iter_18125.pth
  - epoch_149.pth
  - epoch_149_iter_18750.pth
  - epoch_154_iter_19375.pth
  - epoch_159_iter_20000.pth
  - epoch_162.pth
  - epoch_49_iter_6250.pth
  - epoch_54_iter_6875.pth
  - epoch_59_iter_7500.pth
  - epoch_64_iter_8125.pth
  - epoch_69_iter_8750.pth
  - epoch_74_iter_9375.pth
  - epoch_79_iter_10000.pth
  - epoch_84_iter_10625.pth
  - epoch_89_iter_11250.pth
  - epoch_94_iter_11875.pth
  - epoch_99.pth
  - epoch_99_iter_12500.pth
  - LOW_CE_epoch_113_iter_14250_loss_0.0575.pth
  - LOW_CE_epoch_121_iter_15250_loss_0.0375.pth
  - LOW_CE_epoch_126_iter_15875_loss_0.0559.pth
  - LOW_CE_epoch_129_iter_16250_loss_0.0573.pth
  - LOW_CE_epoch_93_iter_11750_loss_0.0386.pth

==================================================
Testing Model: epoch_104_iter_13125
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_104_iter_13125.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_104_iter_13125
epoch_104_iter_13125: Mean Dice = 0.7337, Std = 0.2381
epoch_104_iter_13125: Median Dice = 0.8139, Std = 0.2381

--------------------------------------------------
epoch_104_iter_13125: Mean HD95 = 35.5836, Std = 73.3899
epoch_104_iter_13125: Median HD95 = 8.5440, Std = 73.3899
==================================================

==================================================
Testing Model: epoch_109_iter_13750
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_109_iter_13750.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_109_iter_13750
epoch_109_iter_13750: Mean Dice = 0.7413, Std = 0.2377
epoch_109_iter_13750: Median Dice = 0.8175, Std = 0.2377

--------------------------------------------------
epoch_109_iter_13750: Mean HD95 = 31.0773, Std = 66.8304
epoch_109_iter_13750: Median HD95 = 8.2462, Std = 66.8304
==================================================

==================================================
Testing Model: epoch_114_iter_14375
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_114_iter_14375.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_114_iter_14375
epoch_114_iter_14375: Mean Dice = 0.7392, Std = 0.2357
epoch_114_iter_14375: Median Dice = 0.8130, Std = 0.2357

--------------------------------------------------
epoch_114_iter_14375: Mean HD95 = 32.2114, Std = 70.2989
epoch_114_iter_14375: Median HD95 = 8.0623, Std = 70.2989
==================================================

==================================================
Testing Model: epoch_119_iter_15000
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_119_iter_15000.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_119_iter_15000
epoch_119_iter_15000: Mean Dice = 0.7458, Std = 0.2305
epoch_119_iter_15000: Median Dice = 0.8163, Std = 0.2305

--------------------------------------------------
epoch_119_iter_15000: Mean HD95 = 29.6504, Std = 64.4507
epoch_119_iter_15000: Median HD95 = 8.0623, Std = 64.4507
==================================================

==================================================
Testing Model: epoch_124_iter_15625
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_124_iter_15625.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_124_iter_15625
epoch_124_iter_15625: Mean Dice = 0.7409, Std = 0.2326
epoch_124_iter_15625: Median Dice = 0.8145, Std = 0.2326

--------------------------------------------------
epoch_124_iter_15625: Mean HD95 = 29.9770, Std = 65.0437
epoch_124_iter_15625: Median HD95 = 8.1542, Std = 65.0437
==================================================

==================================================
Testing Model: epoch_129_iter_16250
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_129_iter_16250.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_129_iter_16250
epoch_129_iter_16250: Mean Dice = 0.7471, Std = 0.2319
epoch_129_iter_16250: Median Dice = 0.8187, Std = 0.2319

--------------------------------------------------
epoch_129_iter_16250: Mean HD95 = 29.8864, Std = 65.3258
epoch_129_iter_16250: Median HD95 = 8.2462, Std = 65.3258
==================================================

==================================================
Testing Model: epoch_134_iter_16875
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_134_iter_16875.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_134_iter_16875
epoch_134_iter_16875: Mean Dice = 0.7357, Std = 0.2404
epoch_134_iter_16875: Median Dice = 0.8131, Std = 0.2404

--------------------------------------------------
epoch_134_iter_16875: Mean HD95 = 34.9117, Std = 71.7900
epoch_134_iter_16875: Median HD95 = 9.0471, Std = 71.7900
==================================================

==================================================
Testing Model: epoch_139_iter_17500
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_139_iter_17500.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_139_iter_17500
epoch_139_iter_17500: Mean Dice = 0.7430, Std = 0.2339
epoch_139_iter_17500: Median Dice = 0.8163, Std = 0.2339

--------------------------------------------------
epoch_139_iter_17500: Mean HD95 = 30.1785, Std = 65.5932
epoch_139_iter_17500: Median HD95 = 8.0623, Std = 65.5932
==================================================

==================================================
Testing Model: epoch_144_iter_18125
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_144_iter_18125.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_144_iter_18125
epoch_144_iter_18125: Mean Dice = 0.7346, Std = 0.2374
epoch_144_iter_18125: Median Dice = 0.8107, Std = 0.2374

--------------------------------------------------
epoch_144_iter_18125: Mean HD95 = 35.9121, Std = 72.4475
epoch_144_iter_18125: Median HD95 = 9.1046, Std = 72.4475
==================================================

==================================================
Testing Model: epoch_149
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_149.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_149
epoch_149: Mean Dice = 0.7357, Std = 0.2376
epoch_149: Median Dice = 0.8121, Std = 0.2376

--------------------------------------------------
epoch_149: Mean HD95 = 31.8905, Std = 68.2319
epoch_149: Median HD95 = 8.1818, Std = 68.2319
==================================================

==================================================
Testing Model: epoch_149_iter_18750
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_149_iter_18750.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_149_iter_18750
epoch_149_iter_18750: Mean Dice = 0.7357, Std = 0.2376
epoch_149_iter_18750: Median Dice = 0.8121, Std = 0.2376

--------------------------------------------------
epoch_149_iter_18750: Mean HD95 = 31.8905, Std = 68.2319
epoch_149_iter_18750: Median HD95 = 8.1818, Std = 68.2319
==================================================

==================================================
Testing Model: epoch_154_iter_19375
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_154_iter_19375.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_154_iter_19375
epoch_154_iter_19375: Mean Dice = 0.7325, Std = 0.2393
epoch_154_iter_19375: Median Dice = 0.8096, Std = 0.2393

--------------------------------------------------
epoch_154_iter_19375: Mean HD95 = 33.7186, Std = 70.6683
epoch_154_iter_19375: Median HD95 = 8.9443, Std = 70.6683
==================================================

==================================================
Testing Model: epoch_159_iter_20000
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_159_iter_20000.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_159_iter_20000
epoch_159_iter_20000: Mean Dice = 0.7217, Std = 0.2436
epoch_159_iter_20000: Median Dice = 0.8054, Std = 0.2436

--------------------------------------------------
epoch_159_iter_20000: Mean HD95 = 36.3708, Std = 70.5918
epoch_159_iter_20000: Median HD95 = 10.0249, Std = 70.5918
==================================================

==================================================
Testing Model: epoch_162
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_162.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_162
epoch_162: Mean Dice = 0.7305, Std = 0.2378
epoch_162: Median Dice = 0.8091, Std = 0.2378

--------------------------------------------------
epoch_162: Mean HD95 = 34.5533, Std = 72.6905
epoch_162: Median HD95 = 9.2113, Std = 72.6905
==================================================

==================================================
Testing Model: epoch_49_iter_6250
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_49_iter_6250.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_49_iter_6250
epoch_49_iter_6250: Mean Dice = 0.7307, Std = 0.2414
epoch_49_iter_6250: Median Dice = 0.8051, Std = 0.2414

--------------------------------------------------
epoch_49_iter_6250: Mean HD95 = 30.9183, Std = 65.5214
epoch_49_iter_6250: Median HD95 = 8.7441, Std = 65.5214
==================================================

==================================================
Testing Model: epoch_54_iter_6875
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_54_iter_6875.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_54_iter_6875
epoch_54_iter_6875: Mean Dice = 0.7174, Std = 0.2418
epoch_54_iter_6875: Median Dice = 0.7931, Std = 0.2418

--------------------------------------------------
epoch_54_iter_6875: Mean HD95 = 35.9845, Std = 72.3425
epoch_54_iter_6875: Median HD95 = 10.0000, Std = 72.3425
==================================================

==================================================
Testing Model: epoch_59_iter_7500
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_59_iter_7500.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_59_iter_7500
epoch_59_iter_7500: Mean Dice = 0.7290, Std = 0.2359
epoch_59_iter_7500: Median Dice = 0.7995, Std = 0.2359

--------------------------------------------------
epoch_59_iter_7500: Mean HD95 = 30.5274, Std = 64.7945
epoch_59_iter_7500: Median HD95 = 8.6023, Std = 64.7945
==================================================

==================================================
Testing Model: epoch_64_iter_8125
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_64_iter_8125.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_64_iter_8125
epoch_64_iter_8125: Mean Dice = 0.7240, Std = 0.2425
epoch_64_iter_8125: Median Dice = 0.8013, Std = 0.2425

--------------------------------------------------
epoch_64_iter_8125: Mean HD95 = 34.0232, Std = 70.3870
epoch_64_iter_8125: Median HD95 = 8.9443, Std = 70.3870
==================================================

==================================================
Testing Model: epoch_69_iter_8750
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_69_iter_8750.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_69_iter_8750
epoch_69_iter_8750: Mean Dice = 0.7060, Std = 0.2523
epoch_69_iter_8750: Median Dice = 0.7933, Std = 0.2523

--------------------------------------------------
epoch_69_iter_8750: Mean HD95 = 40.1026, Std = 74.9968
epoch_69_iter_8750: Median HD95 = 10.0573, Std = 74.9968
==================================================

==================================================
Testing Model: epoch_74_iter_9375
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_74_iter_9375.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_74_iter_9375
epoch_74_iter_9375: Mean Dice = 0.7401, Std = 0.2325
epoch_74_iter_9375: Median Dice = 0.8104, Std = 0.2325

--------------------------------------------------
epoch_74_iter_9375: Mean HD95 = 28.6624, Std = 61.1903
epoch_74_iter_9375: Median HD95 = 8.4853, Std = 61.1903
==================================================

==================================================
Testing Model: epoch_79_iter_10000
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_79_iter_10000.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_79_iter_10000
epoch_79_iter_10000: Mean Dice = 0.7333, Std = 0.2400
epoch_79_iter_10000: Median Dice = 0.8127, Std = 0.2400

--------------------------------------------------
epoch_79_iter_10000: Mean HD95 = 33.0936, Std = 70.4477
epoch_79_iter_10000: Median HD95 = 8.6023, Std = 70.4477
==================================================

==================================================
Testing Model: epoch_84_iter_10625
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_84_iter_10625.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_84_iter_10625
epoch_84_iter_10625: Mean Dice = 0.7430, Std = 0.2300
epoch_84_iter_10625: Median Dice = 0.8127, Std = 0.2300

--------------------------------------------------
epoch_84_iter_10625: Mean HD95 = 28.8617, Std = 63.3359
epoch_84_iter_10625: Median HD95 = 8.0623, Std = 63.3359
==================================================

==================================================
Testing Model: epoch_89_iter_11250
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_89_iter_11250.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_89_iter_11250
epoch_89_iter_11250: Mean Dice = 0.7398, Std = 0.2333
epoch_89_iter_11250: Median Dice = 0.8101, Std = 0.2333

--------------------------------------------------
epoch_89_iter_11250: Mean HD95 = 32.3396, Std = 67.1093
epoch_89_iter_11250: Median HD95 = 9.0000, Std = 67.1093
==================================================

==================================================
Testing Model: epoch_94_iter_11875
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_94_iter_11875.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_94_iter_11875
epoch_94_iter_11875: Mean Dice = 0.7516, Std = 0.2207
epoch_94_iter_11875: Median Dice = 0.8153, Std = 0.2207

--------------------------------------------------
epoch_94_iter_11875: Mean HD95 = 27.4172, Std = 61.2378
epoch_94_iter_11875: Median HD95 = 8.0000, Std = 61.2378
==================================================

==================================================
Testing Model: epoch_99
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_99.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_99
epoch_99: Mean Dice = 0.7294, Std = 0.2457
epoch_99: Median Dice = 0.8088, Std = 0.2457

--------------------------------------------------
epoch_99: Mean HD95 = 30.0303, Std = 63.4618
epoch_99: Median HD95 = 8.6023, Std = 63.4618
==================================================

==================================================
Testing Model: epoch_99_iter_12500
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\epoch_99_iter_12500.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: epoch_99_iter_12500
epoch_99_iter_12500: Mean Dice = 0.7294, Std = 0.2457
epoch_99_iter_12500: Median Dice = 0.8088, Std = 0.2457

--------------------------------------------------
epoch_99_iter_12500: Mean HD95 = 30.0303, Std = 63.4618
epoch_99_iter_12500: Median HD95 = 8.6023, Std = 63.4618
==================================================

==================================================
Testing Model: LOW_CE_epoch_113_iter_14250_loss_0.0575
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_113_iter_14250_loss_0.0575.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: LOW_CE_epoch_113_iter_14250_loss_0.0575
LOW_CE_epoch_113_iter_14250_loss_0.0575: Mean Dice = 0.7397, Std = 0.2332
LOW_CE_epoch_113_iter_14250_loss_0.0575: Median Dice = 0.8102, Std = 0.2332

--------------------------------------------------
LOW_CE_epoch_113_iter_14250_loss_0.0575: Mean HD95 = 29.9982, Std = 64.7857
LOW_CE_epoch_113_iter_14250_loss_0.0575: Median HD95 = 8.0623, Std = 64.7857
==================================================

==================================================
Testing Model: LOW_CE_epoch_121_iter_15250_loss_0.0375
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_121_iter_15250_loss_0.0375.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: LOW_CE_epoch_121_iter_15250_loss_0.0375
LOW_CE_epoch_121_iter_15250_loss_0.0375: Mean Dice = 0.7313, Std = 0.2344
LOW_CE_epoch_121_iter_15250_loss_0.0375: Median Dice = 0.8044, Std = 0.2344

--------------------------------------------------
LOW_CE_epoch_121_iter_15250_loss_0.0375: Mean HD95 = 31.8384, Std = 67.1647
LOW_CE_epoch_121_iter_15250_loss_0.0375: Median HD95 = 8.6023, Std = 67.1647
==================================================

==================================================
Testing Model: LOW_CE_epoch_126_iter_15875_loss_0.0559
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_126_iter_15875_loss_0.0559.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: LOW_CE_epoch_126_iter_15875_loss_0.0559
LOW_CE_epoch_126_iter_15875_loss_0.0559: Mean Dice = 0.7433, Std = 0.2307
LOW_CE_epoch_126_iter_15875_loss_0.0559: Median Dice = 0.8117, Std = 0.2307

--------------------------------------------------
LOW_CE_epoch_126_iter_15875_loss_0.0559: Mean HD95 = 31.7125, Std = 67.2290
LOW_CE_epoch_126_iter_15875_loss_0.0559: Median HD95 = 8.5440, Std = 67.2290
==================================================

==================================================
Testing Model: LOW_CE_epoch_129_iter_16250_loss_0.0573
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_129_iter_16250_loss_0.0573.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: LOW_CE_epoch_129_iter_16250_loss_0.0573
LOW_CE_epoch_129_iter_16250_loss_0.0573: Mean Dice = 0.7471, Std = 0.2319
LOW_CE_epoch_129_iter_16250_loss_0.0573: Median Dice = 0.8187, Std = 0.2319

--------------------------------------------------
LOW_CE_epoch_129_iter_16250_loss_0.0573: Mean HD95 = 29.8864, Std = 65.3258
LOW_CE_epoch_129_iter_16250_loss_0.0573: Median HD95 = 8.2462, Std = 65.3258
==================================================

==================================================
Testing Model: LOW_CE_epoch_93_iter_11750_loss_0.0386
==================================================
Loading model from: model/TU_GF7224/TU_pretrain_R50-ViT-B_16_skip3_epo163_bs25_lr0.001_224_s42\LOW_CE_epoch_93_iter_11750_loss_0.0386.pth
Starting inference...
Testing on 1035 samples
Testing:   0%|          | 0/1035 [00:00<?, ?it/s]
==================================================
Results For Model: LOW_CE_epoch_93_iter_11750_loss_0.0386
LOW_CE_epoch_93_iter_11750_loss_0.0386: Mean Dice = 0.7343, Std = 0.2378
LOW_CE_epoch_93_iter_11750_loss_0.0386: Median Dice = 0.8104, Std = 0.2378

--------------------------------------------------
LOW_CE_epoch_93_iter_11750_loss_0.0386: Mean HD95 = 29.5207, Std = 64.2300
LOW_CE_epoch_93_iter_11750_loss_0.0386: Median HD95 = 8.2462, Std = 64.2300
==================================================

Model Comparison Results (Sorted by Mean Dice):
====================================================================================================
Model Mean Dice Std Dice Mean HD95 Std HD95 Median Dice Median HD95 Max Dice Min Dice
23 epoch_94_iter_11875 0.751649 0.220656 27.417227 61.237768 0.815338 8.000000 1.0 0.0
5 epoch_129_iter_16250 0.747051 0.231935 29.886420 65.325750 0.818664 8.246211 1.0 0.0
29 LOW_CE_epoch_129_iter_16250_loss_0.0573 0.747051 0.231935 29.886420 65.325750 0.818664 8.246211 1.0 0.0
3 epoch_119_iter_15000 0.745775 0.230493 29.650399 64.450662 0.816257 8.062258 1.0 0.0
28 LOW_CE_epoch_126_iter_15875_loss_0.0559 0.743278 0.230691 31.712509 67.229009 0.811669 8.544004 1.0 0.0
7 epoch_139_iter_17500 0.742970 0.233910 30.178537 65.593173 0.816336 8.062258 1.0 0.0
21 epoch_84_iter_10625 0.742963 0.229957 28.861673 63.335946 0.812687 8.062258 1.0 0.0
1 epoch_109_iter_13750 0.741331 0.237707 31.077298 66.830365 0.817526 8.246211 1.0 0.0
4 epoch_124_iter_15625 0.740945 0.232636 29.977040 65.043682 0.814515 8.154234 1.0 0.0
19 epoch_74_iter_9375 0.740073 0.232473 28.662412 61.190262 0.810391 8.485281 1.0 0.0
22 epoch_89_iter_11250 0.739835 0.233307 32.339582 67.109268 0.810083 9.000000 1.0 0.0
26 LOW_CE_epoch_113_iter_14250_loss_0.0575 0.739662 0.233171 29.998160 64.785662 0.810197 8.062258 1.0 0.0
2 epoch_114_iter_14375 0.739219 0.235722 32.211425 70.298897 0.813012 8.062258 1.0 0.0
9 epoch_149 0.735745 0.237571 31.890521 68.231875 0.812070 8.181828 1.0 0.0
10 epoch_149_iter_18750 0.735745 0.237571 31.890521 68.231875 0.812070 8.181828 1.0 0.0
6 epoch_134_iter_16875 0.735659 0.240446 34.911670 71.790029 0.813118 9.047077 1.0 0.0
8 epoch_144_iter_18125 0.734607 0.237366 35.912058 72.447460 0.810662 9.104633 1.0 0.0
30 LOW_CE_epoch_93_iter_11750_loss_0.0386 0.734310 0.237822 29.520738 64.229981 0.810378 8.246211 1.0 0.0
0 epoch_104_iter_13125 0.733686 0.238100 35.583605 73.389912 0.813859 8.544004 1.0 0.0
20 epoch_79_iter_10000 0.733316 0.239987 33.093608 70.447702 0.812729 8.602325 1.0 0.0
11 epoch_154_iter_19375 0.732486 0.239264 33.718607 70.668271 0.809594 8.944272 1.0 0.0
27 LOW_CE_epoch_121_iter_15250_loss_0.0375 0.731289 0.234403 31.838361 67.164692 0.804395 8.602325 1.0 0.0
14 epoch_49_iter_6250 0.730672 0.241366 30.918297 65.521412 0.805051 8.744138 1.0 0.0
13 epoch_162 0.730517 0.237756 34.553314 72.690472 0.809128 9.211336 1.0 0.0
24 epoch_99 0.729376 0.245724 30.030342 63.461827 0.808831 8.602325 1.0 0.0
25 epoch_99_iter_12500 0.729376 0.245724 30.030342 63.461827 0.808831 8.602325 1.0 0.0
16 epoch_59_iter_7500 0.729024 0.235870 30.527404 64.794543 0.799470 8.602325 1.0 0.0
17 epoch_64_iter_8125 0.723984 0.242519 34.023163 70.387038 0.801342 8.944272 1.0 0.0
12 epoch_159_iter_20000 0.721729 0.243553 36.370824 70.591781 0.805362 10.024938 1.0 0.0
15 epoch_54_iter_6875 0.717371 0.241837 35.984473 72.342546 0.793067 10.000000 1.0 0.0
18 epoch_69_iter_8750 0.706041 0.252250 40.102620 74.996808 0.793301 10.057284 1.0 0.0
No description has been provided for this image
Best performing model: epoch_94_iter_11875
Mean Dice: 0.7516
Mean HD95: 27.4172
InĀ [44]:
comparison.round(2) # Round the comparison DataFrame for better readability
Out[44]:
Model Mean Dice Std Dice Mean HD95 Std HD95 Median Dice Median HD95 Max Dice Min Dice
23 epoch_94_iter_11875 0.75 0.22 27.42 61.24 0.82 8.00 1.0 0.0
5 epoch_129_iter_16250 0.75 0.23 29.89 65.33 0.82 8.25 1.0 0.0
29 LOW_CE_epoch_129_iter_16250_loss_0.0573 0.75 0.23 29.89 65.33 0.82 8.25 1.0 0.0
3 epoch_119_iter_15000 0.75 0.23 29.65 64.45 0.82 8.06 1.0 0.0
28 LOW_CE_epoch_126_iter_15875_loss_0.0559 0.74 0.23 31.71 67.23 0.81 8.54 1.0 0.0
7 epoch_139_iter_17500 0.74 0.23 30.18 65.59 0.82 8.06 1.0 0.0
21 epoch_84_iter_10625 0.74 0.23 28.86 63.34 0.81 8.06 1.0 0.0
1 epoch_109_iter_13750 0.74 0.24 31.08 66.83 0.82 8.25 1.0 0.0
4 epoch_124_iter_15625 0.74 0.23 29.98 65.04 0.81 8.15 1.0 0.0
19 epoch_74_iter_9375 0.74 0.23 28.66 61.19 0.81 8.49 1.0 0.0
22 epoch_89_iter_11250 0.74 0.23 32.34 67.11 0.81 9.00 1.0 0.0
26 LOW_CE_epoch_113_iter_14250_loss_0.0575 0.74 0.23 30.00 64.79 0.81 8.06 1.0 0.0
2 epoch_114_iter_14375 0.74 0.24 32.21 70.30 0.81 8.06 1.0 0.0
9 epoch_149 0.74 0.24 31.89 68.23 0.81 8.18 1.0 0.0
10 epoch_149_iter_18750 0.74 0.24 31.89 68.23 0.81 8.18 1.0 0.0
6 epoch_134_iter_16875 0.74 0.24 34.91 71.79 0.81 9.05 1.0 0.0
8 epoch_144_iter_18125 0.73 0.24 35.91 72.45 0.81 9.10 1.0 0.0
30 LOW_CE_epoch_93_iter_11750_loss_0.0386 0.73 0.24 29.52 64.23 0.81 8.25 1.0 0.0
0 epoch_104_iter_13125 0.73 0.24 35.58 73.39 0.81 8.54 1.0 0.0
20 epoch_79_iter_10000 0.73 0.24 33.09 70.45 0.81 8.60 1.0 0.0
11 epoch_154_iter_19375 0.73 0.24 33.72 70.67 0.81 8.94 1.0 0.0
27 LOW_CE_epoch_121_iter_15250_loss_0.0375 0.73 0.23 31.84 67.16 0.80 8.60 1.0 0.0
14 epoch_49_iter_6250 0.73 0.24 30.92 65.52 0.81 8.74 1.0 0.0
13 epoch_162 0.73 0.24 34.55 72.69 0.81 9.21 1.0 0.0
24 epoch_99 0.73 0.25 30.03 63.46 0.81 8.60 1.0 0.0
25 epoch_99_iter_12500 0.73 0.25 30.03 63.46 0.81 8.60 1.0 0.0
16 epoch_59_iter_7500 0.73 0.24 30.53 64.79 0.80 8.60 1.0 0.0
17 epoch_64_iter_8125 0.72 0.24 34.02 70.39 0.80 8.94 1.0 0.0
12 epoch_159_iter_20000 0.72 0.24 36.37 70.59 0.81 10.02 1.0 0.0
15 epoch_54_iter_6875 0.72 0.24 35.98 72.34 0.79 10.00 1.0 0.0
18 epoch_69_iter_8750 0.71 0.25 40.10 75.00 0.79 10.06 1.0 0.0